Agent Binary Storage

jameinel · 25 September 2019 10:23

How Juju Tracks and Stores Agent Binaries

As I’m working up a tool to introspect what agent binaries are stored on a controller, I figured I would start documenting the layering that is used when adding or retrieving agent binaries. This document is a work in progress, but hopefully it will be easier to understand the layering than working it out from scratch in the code base.

Background

There are a couple of historic references/reasons why the code is structured as it is. I want to mention them here. At one point agent binaries were just called Juju “tools”, and while we’ve updated the external references in documentation, a fair amount of the internal code still uses the word “tools”.

Also, the code used to store agent binaries is layered on top of other code that was written for storing Charm resources. One of the goals with charm resources was that while each charm would bring along its own resource, if that content happened to be identical, the data would only be stored once in the database. This deduplication is also used for agent binaries. So at the highest level, it appears the binary is stored multiple times, but they all end up as just a reference to

Layers

Layer 1

At the topmost layer you have State.ToolsStorage() which returns a binarystorage.layeredStorage. This is because each model thinks about its charms and agents independently, though they are all stored internally into the same deduped resource storage. Thus lookups try to find the Agent Binary in a per-model storage, and then fall back to the controller model’s storage. When adding, models would store an agent binary referenced only by their own storage. Normal operations is that you upgrade a model to the same version as the controller, though, so it is unlikely to have many referenced by just a model that aren’t actually stored as controller entries.

For every series that we support, we Add an entry to our ToolsStorage. (Note the actual bytes will be deduped at the ‘resource’ layer.) LayeredStorage.Add just adds to the first BinaryStorage.Add. This records in the database collection juju.toolsmetadata a Version string for the Agent Binary (for this model, which is likely to be the controller model), that maps to its sha256 and its “path”. Note that this path is not the ‘complete’ path in the storage layer.

eg db.toolsmetadata.find().pretty():

{
        "_id" : "9b270847-cc92-4d0b-8b03-f3bbf8c1c973:2.7-beta1.2-precise-amd64",
        "version" : "2.7-beta1.2-precise-amd64",
        "size" : NumberLong(49155403),
        "sha256" : "687e44ecc9840584bb45439835bb806d3fe89017b37880935f6070abd8597648",
        "path" : "tools/2.7-beta1.2-precise-amd64-687e44ecc9840584bb45439835bb806d3fe89017b37880935f6070abd8597648",
        "model-uuid" : "9b270847-cc92-4d0b-8b03-f3bbf8c1c973",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [ ]
}

Layer 2

The content of the binary is then passed to managedStorage.PutForBucket(modelUUID, path, content). The path passed in here is the path mentioned in layer 1 (tools/${agent-version}-${sha256}). This path is then expanded to buckets/${modelUUID}/tools/${agent-version}-${sha256}. And saved into juju.managedStoredResources, along with the "resourceid" for the blob. This resourceid is just the sha384 of the content.

eg db.managedStoredResources.find().pretty():

{
        "_id" : "buckets/9b270847-cc92-4d0b-8b03-f3bbf8c1c973/tools/2.7-beta1.2-precise-amd64-687e44ecc9840584bb45439835bb806d3fe89017b37880935f6070abd8597648",
        "bucketuuid" : "9b270847-cc92-4d0b-8b03-f3bbf8c1c973",
        "user" : "",
        "path" : "buckets/9b270847-cc92-4d0b-8b03-f3bbf8c1c973/tools/2.7-beta1.2-precise-amd64-687e44ecc9840584bb45439835bb806d3fe89017b37880935f6070abd8597648",
        "resourceid" : "5edeea8657fb48b2befdc4eacfd26c5fee5c92fba88f92fb85ec6a197703d60ce0ca6faa564b6b112eae2da9f5c46ef9",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [ ]
}

Layer 3

The next layer is where the reference counting and deduplication is done. In juju.storedResources we index by sha384 as the _id and map to a new "path" which is a randomly generated UUID in the mongo GridFS storage. (when adding the second and subsequent agent binaries, the code notes it already has that sha384 and doesn’t save the content to gridfs again, instead just incrementing the refcount.)
eg db.storedResources.find().pretty():

{
        "_id" : "5edeea8657fb48b2befdc4eacfd26c5fee5c92fba88f92fb85ec6a197703d60ce0ca6faa564b6b112eae2da9f5c46ef9",
        "path" : "fe3f749c-f050-4807-8e1c-221143b29247",
        "sha384hash" : "5edeea8657fb48b2befdc4eacfd26c5fee5c92fba88f92fb85ec6a197703d60ce0ca6faa564b6b112eae2da9f5c46ef9",
        "length" : NumberLong(49155403),
        "refcount" : NumberLong(16),
        "txn-revno" : NumberLong(18),
        "txn-queue" : [ ]
}

Layer 4

Mongo’s GridFS is used to store blob content larger than 16MB in size (the largest mongo document). This content is stored in the blobstore database. The metadata of what files exist is stored in blobstore.blobstore.files and the actual content blobs are stored in blobstore.blobstore.chunks. GridFS stores the content in 255k chunks (255*1024).

eg use blobstore; db.blobstore.files.find():

{
        "_id" : ObjectId("5d8b167f9662ad0be5b6729f"),
        "chunkSize" : 261120,
        "uploadDate" : ISODate("2019-09-25T07:25:54.080Z"),
        "length" : 49155403,
        "md5" : "63df772669a14d98ebbb7092e1397829",
        "filename" : "fe3f749c-f050-4807-8e1c-221143b29247"
}

Then the actual content is store in the ‘data’ field of blobstore.chunks.
eg db.blobstore.chunks.find({}, {"data": 0}).limit(1).pretty()

{
        "_id" : ObjectId("5d8b15ad9662ad0927de4190"),
        "files_id" : ObjectId("5d8b15ad9662ad0927de418e"),
        "n" : 1,
        "data" : BinData(0, "<a base64 encoded 255kbyte string here>")
}

Future

Lots of these steps could be cleaned up. My goal with this section is to touch on small inefficiencies that we could hopefully make quite a bit better.

SHA256 vs SHA384

At the lowest level, content is deduped by computing a SHA384 of the content, and then storing it in gridfs from that hash. However, the switch to SHA384 was done after our API already was exposing SHA256, and when that change was made, the API was not updated to report the SHA384 instead. This leads to us both computing multiple sha hashes, but also makes it harder to find the objects you care about. (The top level object saves a path and a SHA256, the next level maps the path to the SHA384, and then the lowest level maps the SHA384 to the gridfs UUID path.)

Repeated Hashing

Currently, when we upload agent binaries, we create a number of “psuedo” references. Namely, for all supported Ubuntu versions, we create a reference using that ubuntu series. (In testing I just ended up with:

{ "version" : "2.7-beta1.2-artful-amd64" }
{ "version" : "2.7-beta1.2-bionic-amd64" }
{ "version" : "2.7-beta1.2-cosmic-amd64" }
{ "version" : "2.7-beta1.2-disco-amd64" }
{ "version" : "2.7-beta1.2-eoan-amd64" }
{ "version" : "2.7-beta1.2-precise-amd64" }
{ "version" : "2.7-beta1.2-quantal-amd64" }
{ "version" : "2.7-beta1.2-raring-amd64" }
{ "version" : "2.7-beta1.2-saucy-amd64" }
{ "version" : "2.7-beta1.2-trusty-amd64" }
{ "version" : "2.7-beta1.2-utopic-amd64" }
{ "version" : "2.7-beta1.2-vivid-amd64" }
{ "version" : "2.7-beta1.2-wily-amd64" }
{ "version" : "2.7-beta1.2-xenial-amd64" }
{ "version" : "2.7-beta1.2-yakkety-amd64" }
{ "version" : "2.7-beta1.2-zesty-amd64" }

Ultimately, these are all just references down to the same bytes. However, they are recorded in the database by caching a string containing the data, iterating over all supported versions, and calling storage.Add(bytes.NewReader(data), metadata).
We do the work to compute the sha256 up front, but as mentioned the data is stored by sha384, so the sha384 is recomputed for each series. (currently 16 times, and it takes approx 1s on my machine). It is even slightly worse than that, because managedStorage.preprocessUpload actually reads the content into a temporary file and computes the sha384 while it is populating the temp file. So while it caches the raw string in memory, it then writes it to disk 16 times, and reads it back again.

We’d like to just store an OS for each agent binary, rather than record each series separately. But so far that cleanup has not bubbled up in priority.