ReleaseEngineering/Applications/Tooltool

< ReleaseEngineering‎ | Applications
Revision as of 01:02, 21 October 2014 by Janx (talk | contribs) (→‎How to upload files to tooltool: Clarify tooltool user ID for non-@mozilla.com LDAPs)

Tooltool basics

Tooltool is a client side program written in Python that uses a file manifest in concert with HTTP servers to retrieve sets of files and perform integrity checks on them based on hashcode verification. The manifests are JSON files which list details of individual files (see below). Each file is represented in the JSON by a dictionary with the keys “filename”, “digest”, “size” and “algorithm”.

A set of files to be fetched by tooltool is specified via a tooltool manifest, usually a file with tt extension.

The following is an example of a valid tooltool manifest:

 [
 {
 "size": 139308,
 "digest": "b2a463249bb3a9e7f2a3604697b000d2393db4f37b623fc099beb8456fbfdb332567013a3131ad138d8633cb19c50a8b77df3990d67500af896cada8b6f698b4",
 "algorithm": "sha512",
 "filename": "file2.pdf"
 },
 {
 "size": 3017536,
 "digest": "630d01a329c70aedb66ae7118d12ff7dc6fe06223d1c27b793e1bacc0ca84dd469ec1a6050184f8d9c35a0636546b0e2e5be08d9b51285e53eb1c9f959fef59d",
 "algorithm": "sha512",
 "filename": "file1.pdf"
 },
 {
 "size": 3420686,
 "digest": "931eb84f798dc9add1a10c7bbd4cc85fe08efda26cac473411638d1f856865524a517209d4c7184d838ee542c8ebc9909dc64ef60f8653a681270ce23524e8e4",
 "algorithm": "sha512",
 "filename": "file3.pdf"
 }
 ]

The simplest usecase for tooltool is to provide a manifest and the url of a tooltool server, and run a "fetch" command to download the files mentioned in the manifest:

 python  tooltool.py fetch --url http://tooltool.pub.build.mozilla.org/temp-sm-stuff -m my-manifest.tt

If a manifest name is not provided, tooltool will default to manifest.tt.

Tooltool will build an url for each of the mentioned files concatenating the server url, the hashing algorithm and the file digest, and try to download.

In this example, the urls will be:

 http://tooltool.pub.build.mozilla.org/temp-sm-stuff/sha512/58941214a8334331e52114aab851fc3d8d5da5dd14983f933da8735c24b0ddcac134e8f13692553199c4d9a14a4b3188b62878a30b9d696edda1204666b60837
 http://tooltool.pub.build.mozilla.org/temp-sm-stuff/sha512/b2a463249bb3a9e7f2a3604697b000d2393db4f37b623fc099beb8456fbfdb332567013a3131ad138d8633cb19c50a8b77df3990d67500af896cada8b6f698b4
 http://tooltool.pub.build.mozilla.org/temp-sm-stuff/sha512/630d01a329c70aedb66ae7118d12ff7dc6fe06223d1c27b793e1bacc0ca84dd469ec1a6050184f8d9c35a0636546b0e2e5be08d9b51285e53eb1c9f959fef59d
 http://tooltool.pub.build.mozilla.org/temp-sm-stuff/931eb84f798dc9add1a10c7bbd4cc85fe08efda26cac473411638d1f856865524a517209d4c7184d838ee542c8ebc9909dc64ef60f8653a681270ce23524e8e4

After downloading the files, their digest is verified according to the algorithm specified in the manifest, and they are finally renamed according to the filename specified in the manifest.

The tooltool download servers are simply apache server folders with a flat structure (all files are stored at root level with no subfolders).

There are global options and command arguments. All terminal interactions after the option parser finishes is done through the Python logging API. The default is to print logging.INFO and higher messages. Currently, the following global options exist:

   -q/--quiet tells Tooltool to print only logging.ERROR and higher messages
   -v/--verbose specifies to print logging.INFO and higher
   -m/--manifest <file> instructs Tooltool to reference a manifest file located at the specified path
   -d/--algorithm <algorithm> instructs Tooltool to use the specified algorithm
   -o/--overwrite tells Tooltool to overwrite a local file if the filename matches the manifest but the hash value is different to the manifest
   --url specifies the base url to be used for remote operations

Where's tooltool code?

Tooltool's main dev repo is https://github.com/mozilla/build-tooltool.

The version of tooltool which is actually deployed in our infrastructure is in https://hg.mozilla.org/build/puppet/modules/packages/templates/. The tooltool.py file in the puppet repo should be in sync with the main dev repo (they should be identical, apart from the she-bang line).

Listing and Validating

The two most basic commands list a manifest and validate the local files against the manifest. The list command lists out all of the files in the manifest as well as whether they are present and/or valid. The return code from listing is zero unless there was an error in listing the files. Absent or invalid files will still result in an exit code of zero if there was no error in the listing process. The validate command is used to check if all the files in the manifest are present and valid. The exit code for validating is zero if all files in the manifest are present and their hash matches the manifest. It is non-zero if any file is missing locally or the file does not have the same hash as the manifest.

Other tooltool features

The tooltool cache

This has been implemented in bug https://bugzilla.mozilla.org/show_bug.cgi?id=858635

When connecting to a tooltool server to fetch a file, it is now possible to use a local cache specifying the local cache folder with the -c option, e.g.:

 python  tooltool.py fetch --url http://tooltool.pub.build.mozilla.org/temp-sm-stuff -c ~/cache

In the previous example, folder ~/cache will be inspected for artifacts before connecting to the specified tooltool server. If the specified cache folder does not exist, tooltool will create it (if the user running tooltool has permissions to do so!). Purging the cache with the purge command

A mechanism to purge the tooltool cache has been implemented as a separate command. Let's see some examples of purge commands:

 python  tooltool.py purge -c ~/cache
 python  tooltool.py purge -c ~/cache -s 34
 python  tooltool.py purge -c ~/cache --size 34

In the first example, we are cleaning up the tooltool cache completely, i.e.: all the content will be wiped out

In the second and third example the extra parameter -s (or, in extended form, --size) is provided, which specifies the number of Gigs we want to be free at the end of the purging process. Tooltool will delete files in the cache folder, starting with the oldest ones, until the specified amount of gigs is free on disk. When there is more free space that the specified gigs when the command is invokes, no files will be deleted at all.

Enabling automated tooltool cache purge within purge_build.py

Since the purge_builds.py script (within the build-tools repo) is already responsible of clean-up operations, the option of cleaning the tooltool cache has been added to that script as well.

In order for the automated cleanup to take place, two environment variables need to be setup when purge_builds.py is invoked, and namely:

 TOOLTOOL_HOME: the folder containing tooltool.py
 TOOLTOOL_CACHE: the local folder being used as tooltool cache (which will be cleaned up)

Using multiple servers with tooltool

The original implementation of tooltool supports one single download server: whenever using tooltool to fetch artifacts, a single server was specified and tooltool tried to download the file from that location exclusively.

The need to support multiple server is justified by at least two reasons:

  • improve tooltool resiliency in case one server is not available, so that requests are made to a backup server
  • allow the setup of tooltool repositories with different levels of visibility

The following example illustrates how to use tooltool with multiple servers:

 python  tooltool.py fetch --url http://server1.example.com --url http://server2.example.com

In this case, tooltool will try to fetch the desired artifacts from server1 and, in case the server is down or does not serve the desired resource, will try with server2. An arbitrary number of servers is supported by the tooltool client. Of course, in order to use this feature the servers will need to be setup.

See also https://bugzilla.mozilla.org/show_bug.cgi?id=768123.

How to upload to tooltool

Tooltool uploads have been traditionally managed by simply scp'ing files to the folder served by the apache instance working as tooltool download server. A new upload mechanism has been implemented to allow better tracking of what is uploaded to tooltool (see "Bug 772190 - tooltool upload mechanism").

The new upload procedure allow users to upload artifacts via rsync to a dedicated upload server, contextually providing some metadata about the upload. The artifacts uploaded to the upload servers will then be distributed to the relevant tooltool download servers (and their mirrors, if any) by a separate sync script which is cron'd on the upload server itself.

Preliminary notes:

  • The server used for uploads (AKA tooltool upload server) is tooltool-uploads.pub.build.mozilla.org (no VPN required)
  • Currently, artifacts uploaded to the tooltool upload server can be made visible to only one tooltool download server:
  • all uploads to /tooltool/uploads/user/pvt on the upload server (more details below) will be visible here

Pre-requisites to run an upload

  • You need to be added to the "vpn_tooltooleditors" LDAP group (see below)
  • You need ssh properly configured to access the tooltool upload server; if the key you normally use to access mozilla servers is ~/.ssh/id_rsa, no further setup is needed. If it's a different one, you will need to add the following entry to your ~/.ssh/config file:
   Host tooltool-uploads.pub.build.mozilla.org
     IdentityFile /path/to/my_private_key
     User my_user

Note that "my_user" above is your LDAP username without the "@mozilla.com" part.

How to enable a user to run tooltool uploads by adding him/her to the vpn_tooltooleditors LDAP group

First, request addition to "vpn_tooltooleditors" group, via "Infrastruture & Operations::Infrastructure:LDAP". Once that's done, it's an hour or two for the change to propagate from LDAP to Puppet to the upload servers. They should request an ACK on the bug from someone in release engineering for that.

How to upload files to tooltool

  • Create a local folder (name it as you like) and put in it all the files you want to upload
  • Run the tooltool distribute command (see example below)

That's it!

Example:

 # Folder /Users/$user/upload contains file1.tar.gz, file2.tar.gz and file3.tar.gz
 tooltool.py distribute --folder /Users/$user/upload --message "Bug 123456 - artifacts needed for this and that" \
   --user $user --host tooltool-uploads.pub.build.mozilla.org --path "/tooltool/uploads/$user/pvt"

Note that "$user" above is your posix UID, i.e. your LDAP username without the "@mozilla.com" part (or different domain).

The parameters in the previous command are:

 --folder: the folder containing the files you want to upload
 --message: any comment you may want to write about this upload
 --user: the user to access the tooltool upload server
 --host: the upload server host or IP
 --path: the target folder on the upload server

Use /tooltool/uploads/<your_username>/pvt to upload to the private download server

Troubleshooting

You may incur in rsync errors like:

   rsync: connection unexpectedly closed (36 bytes received so far) [sender]

In this case, it may help to add the following entries to your ~/.ssh/config file:

   Host tooltool-uploads.pub.build.mozilla.org
     ServerAliveInterval 5
     ServerAliveCountMax 120

What now?

You should receive in a few minutes (5 at most, since the sync script processing uploaded files currently runs every 5 minutes) an email confirming that your upload has been correctly processed on the upload server, and the files should appear in the relevant download server with their correct digest (and in all mirrors, if any, setup for the same visibility level).

The default email address used for these notifications is user@mozilla.com, where user is the user you use to access the upload server. If you want a different email address to be used for tooltool upload notifications, that will need to be added to the sync script configuration by somebody in releng.

All metadata about the upload is stored in the upload server.

Tooltool uploads: the old school

This is the old-school, deprecated, manual way to upload artifacts to tooltool. Please use the official procedure described above whenever possible, in order to keep information about what is uploaded, by whom, and why (the manual procedure doesn't allow to store any metadata about uploads).

If you don't want to upload from your own laptop (because, eg, you have a slow uplink) you can do this from cruncher.

Access cruncher with your credentials:

ME="your_short_ldap_username" # or `whoami`
ssh -A $ME@cruncher.build.mozilla.org

Download the file and then:

scp filename.tar.xz $ME@relengwebadm.private.scl3.mozilla.com:

Login to relengwebadmn:

ssh $ME@relengwebadm.private.scl3.mozilla.com

And deploy the file to tooltool:

FILE=~/emulator.zip # or whatever you're uploading
export SHA512=`openssl sha512 $FILE | cut -d' ' -f2`
sudo mv -i $FILE /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
sudo chmod 644  /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
ls -l  /mnt/netapp/relengweb/tooltool/pvt/build/sha512/${SHA512}
  • Add the filename, filesize, and sha512 digest to the bug you are working on. These can be added to the tooltool manifests later.

The tooltool sync script

For the curious, here is a description of what happens on the upload server once an upload is completed.

As we saw so far, tooltool uploaders copy hashed files and manifests to their own upload folders - they have one for each "distribution type", typically a pub one (to distribute files publicly) and a pvt one (for files to be available to mozilla employees only).

The tooltool sync script serves the purpose of collecting all uploads from individual uploaders and aggregating them per distribution type (public, private), storing some metadata about the uploads themselves. It is supposed to run periodically via crontab to periodically pick new uploads and distribute their content.

Where is the sync script code?

In the main tooltool repo: https://github.com/mozilla/build-tooltool/blob/master/sync.py

Where is the sync script running?

The sync scripts is located in /data/releng/src/tooltool/tooltool on relengwebadm.private.scl3.mozilla.com, where configuration (see below) and logs also live.

How do I deploy a new version of the sync script?

Just grab the desired version from the github repo and copy it to the location where sync.py lives (see previous paragraph)

The sync script configuration

The sync script needs a configuration file in json format named config.json, let's see an example here:

 {
   "upload_root": "/mnt/netapp/tooltool_uploads",
   "target_folders": {"pvt": "/mnt/netapp/relengweb/tooltool/pvt/build/sha512",
                      "pub": "/mnt/netapp/relengweb/tooltool/pub/build/sha512"},
   "smtp_server": "localhost",
   "smtp_port": 25,
   "user_email_mapping": {"dmitchell": "dustin@mozilla.com"},
   "default_domain": "mozilla.com",
   "smtp_from": "no-reply@mozilla.org"
 }
  • upload_root is the base directory in which all personal uploaders directories are located
  • target_folders for each supported "distribution type" (in this case "pvt" and "pub"), the corresponding local destination folder (where all the files from different uploaders will be collected) is specified here
  • smtp* will be used for notification emails to users after uploads
  • default_domain is used to determine the email address of a user which is not explicitly mentioned in the "user_email_mapping" part: the address will be simply username@defaultdomain

In the case of two uploaders, Alice and Bob; this is how the (tooltool) root folder will look like:

 root
    alice
         pub
         pvt
    bob
         pub
           a-package.TOOLTOOL-PACKAGE               
           a_package.tt
           a_package.txt
         pvt

Each user has two upload folders, one for each supported distribution type. The sync script supports an arbitrary number of distribution types, as far as they are defined in the configuration file in the matching section

In its next execution, the sync script will detect the manifest, check that all referenced hash files are present, verify their integrity, and it will proceed copying the hashed files to the destination provided in the config file. Some upload metadata will be stored in text file.

The sync script in action

 

In more detail, the sync script will perform the following actions:

Step 1: Copy of hash files to the appropriate destination

All files mentioned in package_test1.tt manifest, in Bob's pub folder, will be hash checked and copied to the destination specified in the conf file (in this case, "/tooltool/servers/pub"). This will make the files available for tooltool downloads.

Please note that in the following circumstances the copy will not occur:

  • if at least one of the files mentioned in the manifest are missing (for example in case of a partial upload)
  • if at least one of the hashed files' hashes are different from their filename (hash check failure)

If one of these circumstances occur, the sync script will just log that it is impossible to process the given manifest and send a notification message to the uploader.

Step 2: Store locally a copy of the processed manifest

After processing a manifest, the sync script with store locally a copy of the processed manifest, prepending the user, the distribution type, and the processing timestamp to its name. The same will be done with the txt file containing comments (if any).

In this example we will have a manifest stored locally (and a corresponding notes file) as

 bob.pvt.2013_10_16-10.20.29.a-package.tt
 bob.pvt.2013_10_16-10.20.29.a-package.txt

Step 3: Rename the processed manifests in the uploader folder as well

The content of Bob's public folder after sync script processed its manifest will be the following:

 /tooltool/uploads/bob/pvt
     2013_10_16-10.20.29.a-package.tt
     2013_10_16-10.20.29.a-package.txt

The processed package has been deleted, and only the manifest with the corresponding note is present, with the processing timestamp; this means that the manifest has been successfully processed by the sync script.

Step 4: Notify the uploader of success/failure

An email will be sent to the uploader, notifying whether the sync process was completed succesfully or failed for some reasons. If the sync process fails, the corresponding files will obviously not be available for tooltool downloads.

Need to download files from tooltool private server?

If you want to download a file from tooltool private server and you're not part of ReleaseEngineering then add your name to bug 1019011.

That should grant you access to https://secure.pub.build.mozilla.org/tooltool/pvt/build