Juju upgrade-controller fails | space not found

Over the weekend I upgraded a bunch of controllers in various clouds but my MAAS controller seems to have fallen off the rails.

In the controller log I’m seeing repeatedly that it failed to connect to the API on itself.

If I restart jujud-machine-0.service it tries to step through the upgrade but then blows up complaining about being unable to find of one of my MAAS spaces, after which it reverts to complaining about being unable to connect to the API

2020-01-28 22:15:48 INFO juju.cmd supercommand.go:83 running jujud [2.7.1 ec91b3228a8e53bc50d37d23265851328ea354c5 gc go1.12.14]
2020-01-28 22:15:48 DEBUG juju.cmd supercommand.go:84   args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}
2020-01-28 22:15:48 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 4
2020-01-28 22:15:48 DEBUG juju.agent agent.go:545 read agent config, format "2.0"
2020-01-28 22:15:48 INFO juju.cmd.jujud agent.go:133 setting logging config to "<root>=WARNING;unit=DEBUG"
2020-01-28 22:15:49 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-28 22:15:53 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-28 22:15:56 ERROR juju.upgrade upgrade.go:138 upgrade step "ensure stored addresses refer to space by ID, and remove old space name/provider ID" failed: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:15:56 ERROR juju.worker.upgradedatabase worker.go:299 database upgrade from 2.6.6 to 2.7.1 for "machine-0" failed (will retry): ensure stored addresses refer to space by ID, and remove old space name/provider ID: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:17:56 ERROR juju.upgrade upgrade.go:138 upgrade step "ensure stored addresses refer to space by ID, and remove old space name/provider ID" failed: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:17:56 ERROR juju.worker.upgradedatabase worker.go:299 database upgrade from 2.6.6 to 2.7.1 for "machine-0" failed (will retry): ensure stored addresses refer to space by ID, and remove old space name/provider ID: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:19:56 ERROR juju.upgrade upgrade.go:138 upgrade step "ensure stored addresses refer to space by ID, and remove old space name/provider ID" failed: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:19:56 ERROR juju.worker.upgradedatabase worker.go:299 database upgrade from 2.6.6 to 2.7.1 for "machine-0" failed (will retry): ensure stored addresses refer to space by ID, and remove old space name/provider ID: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:21:56 ERROR juju.upgrade upgrade.go:138 upgrade step "ensure stored addresses refer to space by ID, and remove old space name/provider ID" failed: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:21:56 ERROR juju.worker.upgradedatabase worker.go:299 database upgrade from 2.6.6 to 2.7.1 for "machine-0" failed (will retry): ensure stored addresses refer to space by ID, and remove old space name/provider ID: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:23:56 ERROR juju.upgrade upgrade.go:138 upgrade step "ensure stored addresses refer to space by ID, and remove old space name/provider ID" failed: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:23:56 ERROR juju.worker.upgradedatabase worker.go:299 database upgrade from 2.6.6 to 2.7.1 for "machine-0" failed (giving up): ensure stored addresses refer to space by ID, and remove old space name/provider ID: model UUID "9497265d-6bb5-49c0-83ca-ca4248c86a8e": getting machine upgrade ops: space with name: "PLatformInfrastructureSpace" not found
2020-01-28 22:23:57 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-28 22:24:03 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-28 22:24:11 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-28 22:24:20 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-28 22:24:30 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused

@manadart is this something that you might be able to help with?

@dvnt I think what might be happening here is that the space topology has changed in MAAS, and it has decorated provider-sourced addresses with space names, but those spaces have not been loaded into Juju’s spaces collection.

If this is the case, you need to run juju reload-spaces in order to sync up with MAAS.

If the controller remains off-line preventing this, try the following:

  • SSH to the controller machine and edit /var/lib/juju/agents/machine-0/agent.conf, changing upgradedToVersion to the version of the binary (looks like 2.7.1 in your case).
  • Restart the agent at which time no upgrade steps should be attempted.
  • Run juju reload-spaces.
  • Set the value changed in the first step back to 2.6.6 and restart the agent again. This should run the upgrade steps successfully. They are idempotent, so this is safe.

It is also worth filing a bug on Launchpad for this. If my diagnosis is correct, we should see if we can perform this step before attempting space-based upgrade steps.

1 Like

So I tried updating the agent.conf and then starting the juju agent but immediately starts complaining.
Looks like the API/ Juju Gui Webserver is not starting at all, thus I can’t run juju reload-spaces from the Juju CLi.

It’s interesting that is now says space id "-1" not found

2020-01-29 20:20:41 INFO juju.cmd supercommand.go:83 running jujud [2.7.1 ec91b3228a8e53bc50d37d23265851328ea354c5 gc go1.12.14]
2020-01-29 20:20:41 DEBUG juju.cmd supercommand.go:84   args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}
2020-01-29 20:20:41 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 4
2020-01-29 20:20:41 DEBUG juju.agent agent.go:545 read agent config, format "2.0"
2020-01-29 20:20:41 INFO juju.cmd.jujud agent.go:133 setting logging config to "<root>=WARNING;unit=DEBUG"
2020-01-29 20:20:41 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-29 20:20:46 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [949726] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:443: connect: connection refused
2020-01-29 20:20:48 ERROR juju.worker.modelcache worker.go:309 watcher error: space id "-1" not found, getting new watcher
2020-01-29 20:20:48 ERROR juju.worker.modelcache worker.go:309 watcher error: space id "-1" not found, getting new watcher
2020-01-29 20:20:48 ERROR juju.worker.modelcache worker.go:309 watcher error: space id "-1" not found, getting new watcher

Yes, OK. We have a chicken/egg problem here. We can’t run the upgrade steps due to missing data, but we can’t run the API because the upgrade steps to transform data into the new expected form have not been run.

So we’ll need to get the data in manually. This means accessing MongoDB directly.

Revert any change to agent.conf for starters.

To see what we are missing compare the MAAS spaces with the Juju ones. If you are logged into MAAS at the CLI, you can read the MAAS spaces like this:

> maas maas spaces read
Success.
Machine-readable output follows:
[
    {
        "resource_uri": "/MAAS/api/2.0/spaces/1/",
        "id": 1,
        "vlans": [
            {
                "vid": 0,
                "mtu": 1500,
                "dhcp_on": true,
                "external_dhcp": null,
                "relay_vlan": null,
                "space": "space-default",
                "secondary_rack": null,
                "fabric_id": 3,
                "id": 5004,
                "fabric": "fabric-3",
                "primary_rack": "mk7pkx",
                "name": "untagged",
                "resource_uri": "/MAAS/api/2.0/vlans/5004/"
            }
        ],
        "subnets": [
            {
                "name": "172.16.99.0/24",
                "vlan": {
                    "vid": 0,
                    "mtu": 1500,
                    "dhcp_on": true,
                    "external_dhcp": null,
                    "relay_vlan": null,
                    "space": "space-default",
                    "secondary_rack": null,
                    "fabric_id": 3,
                    "id": 5004,
                    "fabric": "fabric-3",
                    "primary_rack": "mk7pkx",
                    "name": "untagged",
                    "resource_uri": "/MAAS/api/2.0/vlans/5004/"
                },
                "cidr": "172.16.99.0/24",
                "rdns_mode": 2,
                "gateway_ip": "172.16.99.254",
                "dns_servers": [],
                "allow_dns": true,
                "allow_proxy": true,
                "active_discovery": false,
                "managed": true,
                "space": "space-default",
                "id": 4,
                "resource_uri": "/MAAS/api/2.0/subnets/4/"
            }
        ],
        "name": "space-default"
    },
    ...
]

This is an example from my controller on a MAAS with one space. Notice that the space definitions exist for each model and the _id field is the name, prefixed with the model UUID:

juju:PRIMARY> db.spaces.find().pretty()
{
        "_id" : "bcabc6ed-66c7-4a6b-8665-a0a3f54e85ea:space-default",
        "life" : 0,
        "name" : "space-default",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "bcabc6ed-66c7-4a6b-8665-a0a3f54e85ea",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e329d141cd6030c4976d4cd_8e679ec8"
        ]
}
{
        "_id" : "bcabc6ed-66c7-4a6b-8665-a0a3f54e85ea:undefined",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "bcabc6ed-66c7-4a6b-8665-a0a3f54e85ea",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e329d141cd6030c4976d4cf_d27ccd58"
        ]
}
{
        "_id" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4:space-default",
        "life" : 0,
        "name" : "space-default",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e329d151cd6030c4976d4d7_b2e10217"
        ]
}
{
        "_id" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4:undefined",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e329d151cd6030c4976d4d9_40250fa3"
        ]
}

Add any missing spaces for each model including the controller. Ignore the txn* fields.

We also need to ensure that Juju has a record of any provider IDs (MAAS’ ID for the spaces). This is a sample from the providerIDs collection in the same deployment as the one above:

{
        "_id" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4:space:1",
        "model-uuid" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e329d151cd6030c4976d4d7_b2e10217"
        ]
}
{
        "_id" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4:space:-1",
        "model-uuid" : "bbbc9dbd-cf97-48d5-88b6-b45340f457c4",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e329d151cd6030c4976d4d9_40250fa3"
        ]
}

As for the spaces, add a record for the missing ones for each model and ignore txn* fields.

After the spaces are congruent between MAAS and Juju, start the controller machine agents again and let them run the upgrade steps.

This also indicates that you have set a hostname for the controller, as it is trying to connect to the standard HTTPS port, rather than the normal API Controller port (17070).
I think the part about doing enough of the upgrade step to get the API up is good. Though I thought we did most of those migration steps before the API server was started.

First off, the juju community rocks! Thanks for the assistance on this.
Secondly: Holy :poop: I think I’m in way over my head here.

So from a mongo perspective, does the capitalisation matter?

In MAAS I have 3 spaces Named

PLatformInfrastructureSpace
PlatformVMInstancesSpace
(undefined)

According to the Mongo query, these are all returned as lower case. Is this maybe why it’s saying that the space is not found?

Snip of MAAS query:

{
    "id": 2,
    "resource_uri": "/MAAS/api/2.0/spaces/2/",
    "subnets": [
        {
            "name": "XXX.XXX.XXX.192/26",
            "vlan": {
                "vid": 1052,
                "mtu": 1500,
                "dhcp_on": false,
                "external_dhcp": null,
                "relay_vlan": null,
                "id": 5002,
                "fabric_id": 0,
                "space": "PlatformVMInstancesSpace",
                "primary_rack": null,
                "fabric": "US-EAST-Copper_Fabric",
                "name": "USE-VA-LAB-PUB",
                "secondary_rack": null,
                "resource_uri": "/MAAS/api/2.0/vlans/5002/"
            },
            "cidr": "XXX.XXX.XXX.192/26",
            "rdns_mode": 2,
            "gateway_ip": "XXX.XXX.XXX.193",
            "dns_servers": [],
            "allow_dns": true,
            "allow_proxy": true,
            "active_discovery": true,
            "managed": true,
            "id": 2,
            "space": "PlatformVMInstancesSpace",
            "resource_uri": "/MAAS/api/2.0/subnets/2/"
        },
        {
            "name": "XXX.XXX.XXX.128/28",
            "vlan": {
                "vid": 1053,
                "mtu": 1500,
                "dhcp_on": true,
                "external_dhcp": null,
                "relay_vlan": null,
                "id": 5012,
                "fabric_id": 0,
                "space": "PlatformVMInstancesSpace",
                "primary_rack": "gaw3xb",
                "fabric": "US-EAST-Copper_Fabric",
                "name": "USE-VA-LAB-VIRSH",
                "secondary_rack": null,
                "resource_uri": "/MAAS/api/2.0/vlans/5012/"
            },
            "cidr": "XXX.XXX.XXX.128/28",
            "rdns_mode": 2,
            "gateway_ip": "XXX.XXX.XXX.129",
            "dns_servers": [],
            "allow_dns": true,
            "allow_proxy": true,
            "active_discovery": false,
            "managed": true,
            "id": 12,
            "space": "PlatformVMInstancesSpace",
            "resource_uri": "/MAAS/api/2.0/subnets/12/"
        },
        {
            "name": "XXX.XXX.XXX.112/29",
            "vlan": {
                "vid": 894,
                "mtu": 1500,
                "dhcp_on": true,
                "external_dhcp": null,
                "relay_vlan": null,
                "id": 5020,
                "fabric_id": 0,
                "space": "PlatformVMInstancesSpace",
                "primary_rack": "gaw3xb",
                "fabric": "US-EAST-Copper_Fabric",
                "name": "894-C3-R-PXYDMZ",
                "secondary_rack": null,
                "resource_uri": "/MAAS/api/2.0/vlans/5020/"
            },
            "cidr": "XXX.XXX.XXX.112/29",
            "rdns_mode": 2,
            "gateway_ip": "XXX.XXX.XXX.113",
            "dns_servers": [],
            "allow_dns": true,
            "allow_proxy": true,
            "active_discovery": false,
            "managed": true,
            "id": 18,
            "space": "PlatformVMInstancesSpace",
            "resource_uri": "/MAAS/api/2.0/subnets/18/"
        }
    ],
    "vlans": [
        {
            "vid": 1052,
            "mtu": 1500,
            "dhcp_on": false,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5002,
            "fabric_id": 0,
            "space": "PlatformVMInstancesSpace",
            "primary_rack": null,
            "fabric": "US-EAST-Copper_Fabric",
            "name": "USE-VA-LAB-PUB",
            "secondary_rack": null,
            "resource_uri": "/MAAS/api/2.0/vlans/5002/"
        },
        {
            "vid": 894,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5020,
            "fabric_id": 0,
            "space": "PlatformVMInstancesSpace",
            "primary_rack": "gaw3xb",
            "fabric": "US-EAST-Copper_Fabric",
            "name": "894-C3-R-PXYDMZ",
            "secondary_rack": null,
            "resource_uri": "/MAAS/api/2.0/vlans/5020/"
        },
        {
            "vid": 1053,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5012,
            "fabric_id": 0,
            "space": "PlatformVMInstancesSpace",
            "primary_rack": "gaw3xb",
            "fabric": "US-EAST-Copper_Fabric",
            "name": "USE-VA-LAB-VIRSH",
            "secondary_rack": null,
            "resource_uri": "/MAAS/api/2.0/vlans/5012/"
        }
    ],
    "name": "PlatformVMInstancesSpace"

Mongo Query

juju:PRIMARY> db.spaces.find().pretty()
{
        "_id" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e:1",
        "spaceid" : "1",
        "life" : 0,
        "name" : "platforminfrastructurespace",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317b_42edfa36"
        ]
}
{
        "_id" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e:2",
        "spaceid" : "2",
        "life" : 0,
        "name" : "platformvminstancesspace",
        "is-public" : false,
        "providerid" : "2",
        "model-uuid" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317b_42edfa36"
        ]
}
{
        "_id" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e:3",
        "spaceid" : "3",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317b_42edfa36"
        ]
}
{
        "_id" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e:0",
        "spaceid" : "0",
        "life" : 0,
        "name" : "alpha",
        "is-public" : true,
        "model-uuid" : "9497265d-6bb5-49c0-83ca-ca4248c86a8e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317b_42edfa36",
                "5e2b2c3d3c71e65a73bd3193_b2d3a172",
                "5e2b2cb53c71e65a73bd319a_f48dcfe6",
                "5e2b2d2d3c71e65a73bd31a1_32da53ad",
                "5e2b2da53c71e65a73bd31a8_ca482d35",
                "5e2c64c93c71e60448cd7494_1fb809b7",
                "5e2c65423c71e60448cd749c_c9c833e0",
                "5e2c65ba3c71e60448cd74a3_eaa87f2c",
                "5e2c66323c71e60448cd74aa_bd188f5a",
                "5e2c66aa3c71e60448cd74b1_8495da0a",
                "5e30b0913c71e6154f11c7a7_29b68fb4",
                "5e30b1093c71e6154f11c7af_5c71f4e5",
                "5e30b1813c71e6154f11c7b6_15a1fb3c",
                "5e30b1f93c71e6154f11c7bd_b224f936",
                "5e30b2713c71e6154f11c7c4_9822a326",
                "5e30b29c3c71e616b98ee4b6_27adb29b",
                "5e30b3143c71e616b98ee4be_aa5fde1b",
                "5e30b38c3c71e616b98ee4c5_8d5a7442",
                "5e30b4043c71e616b98ee4cc_a18c9d00",
                "5e30b47c3c71e616b98ee4d3_01f86320",
                "5e30cfa73c71e6180d85bff6_5d43ab58"
        ]
}
{
        "_id" : "c64de938-32d1-4528-8604-88c9ef8f5e9e:1",
        "spaceid" : "1",
        "life" : 0,
        "name" : "platforminfrastructurespace",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "c64de938-32d1-4528-8604-88c9ef8f5e9e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317c_206414be"
        ]
}
{
        "_id" : "c64de938-32d1-4528-8604-88c9ef8f5e9e:2",
        "spaceid" : "2",
        "life" : 0,
        "name" : "platformvminstancesspace",
        "is-public" : false,
        "providerid" : "2",
        "model-uuid" : "c64de938-32d1-4528-8604-88c9ef8f5e9e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317c_206414be"
        ]
}
{
        "_id" : "c64de938-32d1-4528-8604-88c9ef8f5e9e:3",
        "spaceid" : "3",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "c64de938-32d1-4528-8604-88c9ef8f5e9e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317c_206414be"
        ]
}
{
        "_id" : "c64de938-32d1-4528-8604-88c9ef8f5e9e:0",
        "spaceid" : "0",
        "life" : 0,
        "name" : "alpha",
        "is-public" : true,
        "model-uuid" : "c64de938-32d1-4528-8604-88c9ef8f5e9e",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317c_206414be",
                "5e2b2c3d3c71e65a73bd3194_e4ccbfb9",
                "5e2b2cb53c71e65a73bd319b_8262416e",
                "5e2b2d2d3c71e65a73bd31a2_2acc5702",
                "5e2b2da53c71e65a73bd31a9_9bd37bf0",
                "5e2c64c93c71e60448cd7495_f9f624c2",
                "5e2c65423c71e60448cd749d_4a56b4ab",
                "5e2c65ba3c71e60448cd74a4_4954642b",
                "5e2c66323c71e60448cd74ab_b0e5182e",
                "5e2c66aa3c71e60448cd74b2_8c629858",
                "5e30b0913c71e6154f11c7a8_761e425c",
                "5e30b1093c71e6154f11c7b0_ba4a1a09",
                "5e30b1813c71e6154f11c7b7_c42999ba",
                "5e30b1f93c71e6154f11c7be_9da91ad6",
                "5e30b2713c71e6154f11c7c5_dfbe3414",
                "5e30b29c3c71e616b98ee4b7_3e123723",
                "5e30b3143c71e616b98ee4bf_86191c11",
                "5e30b38c3c71e616b98ee4c6_1b8088db",
                "5e30b4043c71e616b98ee4cd_312e67bb",
                "5e30b47c3c71e616b98ee4d4_b0d5699e",
                "5e30cfa73c71e6180d85bff7_5dd6ef1b"
        ]
}
{
        "_id" : "2e642325-0eca-4b63-8238-53591a1089b9:1",
        "spaceid" : "1",
        "life" : 0,
        "name" : "platforminfrastructurespace",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "2e642325-0eca-4b63-8238-53591a1089b9",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317d_5395aed2"
        ]
}
{
        "_id" : "2e642325-0eca-4b63-8238-53591a1089b9:2",
        "spaceid" : "2",
        "life" : 0,
        "name" : "platformvminstancesspace",
        "is-public" : false,
        "providerid" : "2",
        "model-uuid" : "2e642325-0eca-4b63-8238-53591a1089b9",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317d_5395aed2"
        ]
}
{
        "_id" : "2e642325-0eca-4b63-8238-53591a1089b9:3",
        "spaceid" : "3",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "2e642325-0eca-4b63-8238-53591a1089b9",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317d_5395aed2"
        ]
}
{
        "_id" : "2e642325-0eca-4b63-8238-53591a1089b9:0",
        "spaceid" : "0",
        "life" : 0,
        "name" : "alpha",
        "is-public" : true,
        "model-uuid" : "2e642325-0eca-4b63-8238-53591a1089b9",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317d_5395aed2",
                "5e2b2c3d3c71e65a73bd3195_50a83d90",
                "5e2b2cb53c71e65a73bd319c_7f22e7f4",
                "5e2b2d2d3c71e65a73bd31a3_bab53b29",
                "5e2b2da53c71e65a73bd31aa_7c33dd07",
                "5e2c64c93c71e60448cd7496_5ad56a6f",
                "5e2c65423c71e60448cd749e_707905a4",
                "5e2c65ba3c71e60448cd74a5_b88358bf",
                "5e2c66323c71e60448cd74ac_2e79d99c",
                "5e2c66aa3c71e60448cd74b3_a7c96c86",
                "5e30b0913c71e6154f11c7a9_4681a9ea",
                "5e30b1093c71e6154f11c7b1_162cc0a3",
                "5e30b1813c71e6154f11c7b8_3ee163b4",
                "5e30b1f93c71e6154f11c7bf_77c61c0a",
                "5e30b2713c71e6154f11c7c6_ca65773a",
                "5e30b29c3c71e616b98ee4b8_ce8648d3",
                "5e30b3143c71e616b98ee4c0_bde26790",
                "5e30b38c3c71e616b98ee4c7_8f05e375",
                "5e30b4043c71e616b98ee4ce_2b30c1fc",
                "5e30b47c3c71e616b98ee4d5_6c8d0a2b",
                "5e30cfa73c71e6180d85bff8_6bfffa87"
        ]
}
{
        "_id" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6:1",
        "spaceid" : "1",
        "life" : 0,
        "name" : "platforminfrastructurespace",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317e_db7e082b"
        ]
}
{
        "_id" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6:2",
        "spaceid" : "2",
        "life" : 0,
        "name" : "platformvminstancesspace",
        "is-public" : false,
        "providerid" : "2",
        "model-uuid" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317e_db7e082b"
        ]
}
{
        "_id" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6:3",
        "spaceid" : "3",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317e_db7e082b"
        ]
}
{
        "_id" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6:0",
        "spaceid" : "0",
        "life" : 0,
        "name" : "alpha",
        "is-public" : true,
        "model-uuid" : "8cb3c466-ddfd-4795-8c60-b8103504b9d6",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317e_db7e082b",
                "5e2b2c3d3c71e65a73bd3196_ebd13e5f",
                "5e2b2cb53c71e65a73bd319d_f121b9f9",
                "5e2b2d2d3c71e65a73bd31a4_e1507b72",
                "5e2b2da53c71e65a73bd31ab_95711515",
                "5e2c64c93c71e60448cd7497_c00768fa",
                "5e2c65423c71e60448cd749f_a6963cf4",
                "5e2c65ba3c71e60448cd74a6_948ba475",
                "5e2c66323c71e60448cd74ad_8fd40c3f",
                "5e2c66aa3c71e60448cd74b4_cbbe6c54",
                "5e30b0913c71e6154f11c7aa_e0c092d9",
                "5e30b1093c71e6154f11c7b2_58b60ecf",
                "5e30b1813c71e6154f11c7b9_c8055cb2",
                "5e30b1f93c71e6154f11c7c0_b426ef2b",
                "5e30b2713c71e6154f11c7c7_4e9c1d36",
                "5e30b29c3c71e616b98ee4b9_e4f53879",
                "5e30b3143c71e616b98ee4c1_183e41de",
                "5e30b38c3c71e616b98ee4c8_7896a1fe",
                "5e30b4043c71e616b98ee4cf_0266e2dd",
                "5e30b47c3c71e616b98ee4d6_d7f6620a",
                "5e30cfa73c71e6180d85bff9_ed8febf5"
        ]
}
{
        "_id" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a:1",
        "spaceid" : "1",
        "life" : 0,
        "name" : "platforminfrastructurespace",
        "is-public" : false,
        "providerid" : "1",
        "model-uuid" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317f_b3d5e4d8"
        ]
}
{
        "_id" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a:2",
        "spaceid" : "2",
        "life" : 0,
        "name" : "platformvminstancesspace",
        "is-public" : false,
        "providerid" : "2",
        "model-uuid" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317f_b3d5e4d8"
        ]
}
{
        "_id" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a:3",
        "spaceid" : "3",
        "life" : 0,
        "name" : "undefined",
        "is-public" : false,
        "providerid" : "-1",
        "model-uuid" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317f_b3d5e4d8"
        ]
}
{
        "_id" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a:0",
        "spaceid" : "0",
        "life" : 0,
        "name" : "alpha",
        "is-public" : true,
        "model-uuid" : "90c9f5ac-8a7c-4d22-819c-6e86c0089c1a",
        "txn-revno" : NumberLong(2),
        "txn-queue" : [
                "5e2b2bc53c71e65a73bd317f_b3d5e4d8",
                "5e2b2c3d3c71e65a73bd3197_627fa86a",
                "5e2b2cb53c71e65a73bd319e_208646cc",
                "5e2b2d2d3c71e65a73bd31a5_668b739c",
                "5e2b2da53c71e65a73bd31ac_bc0ef2af",
                "5e2c64c93c71e60448cd7498_566c6f06",
                "5e2c65423c71e60448cd74a0_057bfca8",
                "5e2c65ba3c71e60448cd74a7_e8c4b2c6",
                "5e2c66323c71e60448cd74ae_f89deb58",
                "5e2c66aa3c71e60448cd74b5_4bfb23ec",
                "5e30b0913c71e6154f11c7ab_af7a4de2",
                "5e30b1093c71e6154f11c7b3_f13f651f",
                "5e30b1813c71e6154f11c7ba_2a8e9517",
                "5e30b1f93c71e6154f11c7c1_f7c7fb74",
                "5e30b2713c71e6154f11c7c8_66a1822a",
                "5e30b29c3c71e616b98ee4ba_a4503a58",
                "5e30b3143c71e616b98ee4c2_75ab2ace",
                "5e30b38c3c71e616b98ee4c9_2eaa5171",
                "5e30b4043c71e616b98ee4d0_7679c31b",
                "5e30b47c3c71e616b98ee4d7_7d673304",
                "5e30cfa73c71e6180d85bffa_644c96f8"
        ]
}
Type "it" for more
juju:PRIMARY> ^C
bye

Just to put it out there. I haven’t renamed or added any spaces in MAAS since any of these models were built.

Yes. The issue we are actually experiencing is this one:
https://bugs.launchpad.net/juju/+bug/1856537

1 Like

ah crap. Is there anyway to tell the controller to abort the upgrade?

I will fix get a fix in with priority. At the same time I will work out how to get your deployment over the hump.

It should be a case of updating the space name on addresses in the machines collection, but I will confirm.

I could try renaming the spaces to all lower case in MAAS?

Or should I just stop poking it with a stick?

That won’t help to get past the upgrade step, because the fact that the controller is down means we are not polling/updating the machine addresses. They will remain mixed case.

Right.

I have replicated your scenario and was able to complete the upgrade after running this against the database:

db.machines.find().forEach(
  function(e) {
    if (e.preferredprivateaddress.spacename != null) {
        e.preferredprivateaddress.spacename = e.preferredprivateaddress.spacename.toLowerCase();
    }
    
    if (e.preferredpublicaddress.spacename != null) {
        e.preferredpublicaddress.spacename = e.preferredpublicaddress.spacename.toLowerCase();
    }
    
    for (i = 0; i < e.addresses.length; i++) { 
        if (e.addresses[i].spacename != null) {
            e.addresses[i].spacename = e.addresses[i].spacename.toLowerCase();
        }
    } 

    db.machines.save(e);
  }
)

After the upgrade, you will see recurrent errors in the logs due to the instance poller failing. I am addressing this right now for a 2.7.2 release ASAP.

An interim fix will be to rename the MAAS spaces to use lower casing after the upgrade runs.

2 Likes

Patches that will rectify this in the imminent 2.7.2 release:

1 Like

Is there a way to destroy a controller and ingest existing models into a fresh replacement controller?

I’m reaching the point of desperation where I need to make changes to my environment and deploy some additional ones but I’m currently dead in the water because of a dead controller.

I’m definitely going to be looking into HA controllers moving forward, but I have a concern that even going this direction, upgrades may still not be as smooth sailing as anticipated