JuJu Backup/restore issues

TLDR; JuJu restore function does not work correctly. Does not pass keys or passwords from the restore or does not back them up, which has ruined a production system that I have to figure out how to recover.

juju restore-backup -b --file juju-backup-20190220-002259.tar.gz --constraints tags="jujuslave1" --debug
08:08:32 INFO  juju.cmd supercommand.go:56 running juju [2.3.7 gc go1.8]
08:08:32 DEBUG juju.cmd supercommand.go:57   args: []string{"juju", "restore-backup", "-b", "--file", "juju-backup-20190220-002259.tar.gz", "--constraints", "tags=jujuslave1", "--debug"}
08:08:36 DEBUG juju.provider.maas environ
provider.go:61 opening model "controller".
08:08:39 DEBUG juju.environs.bootstrap bootstrap.go:199 model "controller" supports service/machine networks: true
08:08:39 DEBUG juju.environs.bootstrap bootstrap.go:201 network management by juju enabled: true
08:08:39 INFO  cmd bootstrap.go:233 Loading image metadata
08:08:39 INFO  cmd bootstrap.go:303 Looking for packaged Juju agent version 2.3.7 for amd64
08:08:39 INFO  juju.environs.bootstrap tools.go:72 looking for bootstrap agent binaries: version=2.3.7
08:08:39 DEBUG juju.environs.tools tools.go:102 finding agent binaries in stream: "released"
08:08:39 DEBUG juju.environs.tools tools.go:104 reading agent binaries with major.minor version 2.3
08:08:39 DEBUG juju.environs.tools tools.go:112 filtering agent binaries by version: 2.3.7
08:08:39 DEBUG juju.environs.tools tools.go:115 filtering agent binaries by series: xenial
08:08:39 DEBUG juju.environs.tools tools.go:118 filtering agent binaries by architecture: amd64
08:08:39 DEBUG juju.environs.tools urls.go:109 trying datasource "keystone catalog"
08:08:40 DEBUG juju.environs.simplestreams simplestreams.go:683 using default candidate for content id "com.ubuntu.juju:released:tools" are {20161007 mirrors:1.0 content-download streams/v1/cpc-mirrors.sjson []}
08:08:43 INFO  juju.environs.bootstrap tools.go:74 found 1 packaged agent binaries
08:08:43 INFO  cmd bootstrap.go:389 Starting new instance for initial controller
08:08:44 INFO  cmd bootstrap.go:151 Launching controller instance(s) on hqjuju...
08:08:44 DEBUG juju.provider.maas environ.go:929 attempting to acquire node in zone "default"
08:08:45 DEBUG juju.cloudconfig.instancecfg instancecfg.go:835 Setting numa ctl preference to false
08:08:45 DEBUG juju.service discovery.go:63 discovered init system "systemd" from series "xenial"
08:08:45 DEBUG juju.provider.maas environ.go:1019 maas user data; 3456 bytes
08:08:48 DEBUG juju.provider.maas environ.go:1051 started instance "sfrswa"
08:08:48 INFO  cmd bootstrap.go:225  - sfrswa (arch=amd64 mem=4G cores=4)  
08:08:48 INFO  juju.environs.bootstrap bootstrap.go:638 newest version: 2.3.7
08:08:48 INFO  juju.environs.bootstrap bootstrap.go:653 picked bootstrap agent binary version: 2.3.7
08:08:48 INFO  cmd bootstrap.go:425 Installing Juju agent on bootstrap instance
08:08:48 INFO  cmd bootstrap.go:517 Juju GUI installation has been disabled
08:12:49 DEBUG juju.cloudconfig.instancecfg instancecfg.go:835 Setting numa ctl preference to false
Waiting for address
Attempting to connect to 10.54.4.17:22
08:12:50 DEBUG juju.provider.common bootstrap.go:564 connection attempt for 10.54.4.17 failed: /var/lib/juju/nonce.txt does not exist
08:12:55 DEBUG juju.provider.common bootstrap.go:564 connection attempt for 10.54.4.17 failed: /var/lib/juju/nonce.txt does not exist
08:13:01 DEBUG juju.provider.common bootstrap.go:564 connection attempt for 10.54.4.17 failed: /var/lib/juju/nonce.txt does not exist
08:13:06 INFO  cmd bootstrap.go:333 Connected to 10.54.4.17
08:13:06 INFO  juju.cloudconfig userdatacfg_unix.go:460 Fetching agent: curl -sSfw 'agent binaries from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz <[`https://streams.canonical.com/juju/tools/agent/2.3.7/juju-2.3.7-ubuntu-amd64.tgz`]>
08:13:06 INFO  cmd bootstrap.go:403 Running machine configuration script...
sudo: unable to resolve host hqosjujuslave1
08:14:55 INFO  cmd bootstrap.go:455 Bootstrap agent now started
08:14:56 INFO  juju.juju api.go:302 API endpoints changed from [] to [10.54.4.17:17070]
08:14:56 INFO  cmd controller.go:82 Contacting Juju controller at 10.54.4.17 to verify accessibility...
08:14:56 INFO  juju.juju api.go:67 connecting to API addresses: [10.54.4.17:17070]
08:14:58 DEBUG juju.api apiclient.go:843 successfully dialed "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:14:58 INFO  juju.api apiclient.go:597 connection established to "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:14:58 DEBUG juju.api monitor.go:35 RPC connection died
08:14:58 INFO  cmd controller.go:87 Bootstrap complete, "hqjuju" controller now available
08:14:58 INFO  cmd controller.go:88 Controller machines are in the "controller" model
08:14:58 INFO  cmd controller.go:89 Initial model "default" added
08:14:58 INFO  juju.juju api.go:67 connecting to API addresses: [10.54.4.17:17070]
08:14:58 DEBUG juju.api apiclient.go:843 successfully dialed "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:14:58 INFO  juju.api apiclient.go:597 connection established to "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:14:58 DEBUG juju.api.backups restore.go:54 Will attempt to call 'PrepareRestore'
08:14:58 INFO  juju.juju api.go:67 connecting to API addresses: [10.54.4.17:17070]
08:14:58 DEBUG juju.api apiclient.go:843 successfully dialed "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:14:58 INFO  juju.api apiclient.go:597 connection established to "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:14:58 DEBUG juju.api monitor.go:35 RPC connection died
08:14:58 DEBUG juju.api.backups restore.go:75 Server is now in 'about to restore' mode, proceeding to upload the backup file
08:14:58 DEBUG httpbakery client.go:244 client do PUT https://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/backups {
08:15:03 DEBUG httpbakery client.go:246 } -> error <nil>
08:15:03 DEBUG juju.api.backups restore.go:119 Attempting Restore of "20190220-002259.1d72c1b2-ddc1-4acb-890f-4848f7b43053"
08:15:03 INFO  juju.juju api.go:67 connecting to API addresses: [10.54.4.17:17070]
08:15:03 DEBUG juju.api apiclient.go:843 successfully dialed "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:15:03 INFO  juju.api apiclient.go:597 connection established to "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:15:12 DEBUG juju.api monitor.go:35 RPC connection died
08:16:03 DEBUG juju.api monitor.go:55 health ping failed: connection is shut down
08:22:26 DEBUG juju.api.backups restore.go:169 Attempting finishRestore
08:22:26 INFO  juju.juju api.go:67 connecting to API addresses: [10.54.4.17:17070]
08:22:28 DEBUG juju.api apiclient.go:843 successfully dialed "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:22:28 INFO  juju.api apiclient.go:597 connection established to "wss://10.54.4.17:17070/model/1d72c1b2-ddc1-4acb-890f-4848f7b43053/api"
08:22:28 INFO  juju.juju api.go:302 API endpoints changed from [10.54.4.17:17070] to [10.54.4.17:17070 10.54.4.2:17070 10.54.4.3:17070]
08:22:28 DEBUG juju.api monitor.go:35 RPC connection died
08:22:28 ERROR juju.api.backups restore.go:136 could not clean up after failed restore attempt: cannot complete restore: <nil>: juju restore is in progress - API is disabled to prevent data loss
08:22:28 DEBUG juju.rpc server.go:311 error closing codec: write tcp 10.54.4.15:51268->10.54.4.17:17070: i/o timeout
ERROR cannot perform restore: <nil>: restore failed: cannot open state: cannot read model 1d72c1b2-ddc1-4acb-890f-4848f7b43053: unauthorized mongo access: not authorized on juju to execute command { find: "models", filter: { _id: "1d72c1b2-ddc1-4acb-890f-4848f7b43053" }, skip: 0, limit: 1, batchSize: 1, singleBatch: true } (unauthorized access)
08:22:28 DEBUG cmd supercommand.go:459 error stack: 
restore failed: cannot open state: cannot read model 1d72c1b2-ddc1-4acb-890f-4848f7b43053: unauthorized mongo access: not authorized on juju to execute command { find: "models", filter: { _id: "1d72c1b2-ddc1-4acb-890f-4848f7b43053" }, skip: 0, limit: 1, batchSize: 1, singleBatch: true } (unauthorized access)
github.com/juju/juju/rpc/client.go:149: 
github.com/juju/juju/api/apiclient.go:924: 
github.com/juju/juju/api/backups/restore.go:137: cannot perform restore: <nil>
github.com/juju/juju/cmd/juju/backups/restore.go:436: 

JuJu 2.3.7-xenial-amd64
mongo3.2
MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1)

I had a replset/HA available within juju and ran into a whole other list of issues with attempting to follow the instructions found here:
https://discourse.jujucharms.com/t/controller-backups/1106
https://docs.jujucharms.com/2.3/en/controllers-backup

After this error I am left with:

juju controllers --refresh
error updating cached details for "hqjuju": juju restore is in progress - API is disabled to prevent data loss
Controller  Model       User   Access     Cloud/Region  Models  Machines    HA  Version
hqjuju*     controller  admin  superuser  hqjuju             3         -  none  2.3.7
hqjuju2     default     admin  superuser  hqjuju2            2         1  none  2.3.7

If I log into mongo, manually and reset the password on the host to the new hosts password I get 508x cert errors everytime I attempt to utilize any juju commands. Any help would be appreciated. I can find no other forums or issues/bug reports that lead me to any type of solution.

When I do the restore my node gets stuck here, so I cannot complete the removal of the other nodes and then recreate the HA.

MongoDB shell version: 3.2.15
connecting to: localhost:37017/juju
2019-03-13T17:41:21.547+0000 W NETWORK  [thread1] SSL peer certificate validation failed: unable to get local issuer certificate
Server has startup warnings: 
2019-03-13T18:08:53.733+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2019-03-13T18:08:53.733+0000 I CONTROL  [initandlisten] 
2019-03-13T18:08:53.733+0000 I CONTROL  [initandlisten] 
2019-03-13T18:08:53.733+0000 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 20000 processes, 65000 files. Number of processes should be at least 32500 : 0.5 times number of files.
juju:PRIMARY> use admin
switched to db admin
juju:PRIMARY> rs.status();
{
        "set" : "juju",
        "date" : ISODate("2019-03-13T17:42:03.911Z"),
        "myState" : 1,
        "term" : NumberLong(4),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 1,
                        "name" : "10.54.4.17:37017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : -1610,
                        "optime" : {
                                "ts" : Timestamp(1552500534, 1),
                                "t" : NumberLong(4)
                        },
                        "optimeDate" : ISODate("2019-03-13T18:08:54Z"),
                        "infoMessage" : "could not find member to sync from",
                        "electionTime" : Timestamp(1552500533, 1),
                        "electionDate" : ISODate("2019-03-13T18:08:53Z"),
                        "configVersion" : 1,
                        "self" : true
                }
        ],
        "ok" : 1
}
juju:PRIMARY>

@matthew.williams,

This post was written around the time you have described your difficulties - Manual Steps to backup-restore.

I trust you had moved past this point, so am linking it for reference :slight_smile: