Pre juju 2.6.5 Upgrade Steps for large log collections

Pre juju 2.6.5 Upgrade Steps for large log collections

If the controller’s log collection is large, converting the collection to a capped collection can take a long time, during which the juju db is locked. This makes the controller appear unresponsive.

Check size of log collection

Before upgrading to juju 2.6.5, especially for long running controllers, it’s recommend to check the size of the logs collection:

Copy this script where you run the juju client from, make sure to chmod +x, and execute:

#!/bin/bash

read -d '' -r cmds <<'EOF'
conf=/var/lib/juju/agents/machine-*/agent.conf
user=`sudo grep tag $conf | cut -d' ' -f2`
password=`sudo grep statepassword $conf | cut -d' ' -f2`
if [ -f /usr/lib/juju/mongo*/bin/mongo ]; then
  client=/usr/lib/juju/mongo*/bin/mongo
else
  client=/usr/bin/mongo
fi
$client 127.0.0.1:37017/logs --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username "$user" --password "$password" --eval "print(db); \
if (db.getName() !== \"logs\") { \
    throw Error(\"not using logs\"); \
} else {\
    function bytesToSize(bytes) {\
       var sizes = [\"Bytes\", \"KB\", \"MB\", \"GB\"];\
       if (bytes == 0) return \"0 Byte\";\
       var i = parseInt(Math.floor(Math.log(bytes) / Math.log(1024)));\
       return Math.round(bytes / Math.pow(1024, i), 2) + \" \" + sizes[i];\
    };\
    db.getCollectionNames().forEach(function(name) { \
        var storageSize = db[name].stats()[\"storageSize\"];\
        print(name, bytesToSize(storageSize));\
        if (storageSize >= 1073741824) {\
           print(\"You need to run the prune logs script\");\
        }\
    });\
}"
EOF

juju ssh -m controller 0 "$cmds"

You’ll see output like:

logs
logs.c92cb8ea-c77a-4739-8298-c3cd6a79196b 48 KB
logs.d28edd72-1b1d-4537-80b5-6515902b4793 112 KB

A recommendation of "You need to run the prune logs script" will be printed if the collection is over 1GB, otherwise the size of the collection is printed.

Prune the juju log collection

Note: this operation may take a few hours based on the size collection.

:stop_sign: Please stop the controllers when running this script. The drop function also drops the indices of that collection. The indices are only checked and created when the controller starts. So if the collections are dropped while the controller is running, the new log collections are created, but with no indices, which then causes significant mongo impact.

Copy this script where you run the juju client from, make sure to chmod +x, and execute:

#!/bin/bash

read -d '' -r cmds <<'EOF'
conf=/var/lib/juju/agents/machine-*/agent.conf
user=`sudo grep tag $conf | cut -d' ' -f2`
password=`sudo grep statepassword $conf | cut -d' ' -f2`
if [ -f /usr/lib/juju/mongo*/bin/mongo ]; then
  client=/usr/lib/juju/mongo*/bin/mongo
else
  client=/usr/bin/mongo
fi
$client 127.0.0.1:37017/logs --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username "$user" --password "$password" --eval "print(db); \
if (db.getName() !== \"logs\") { \
    throw Error(\"not using logs\"); \
} else {
    db.getCollectionNames().forEach(function(name) { \
        printjson(db[name].drop()); \
    }); \
}"
EOF

juju ssh -m controller 0 "$cmds"
2 Likes

Dropping the oversized collections outright instead of deleting every record will be a lot faster, i.e. something like:
printjson(db[name].drop());
We’ve done this in the past and Juju happily recreates the collection without complaint.

1 Like

@pjdc that’s great to know, I just wasn’t sure about the indexes and if they would be recreated or not.

I checked in the code to understand how that happens. When the machine agent starts it runs the upgradesteps worker which (indirectly) calls MachineAgent.openStateForUpgrade, which creates a state pool with state.InitDatabase. That will create the log collection indexes for every model.

I’ll update the script to drop the collection instead.

1 Like