Model destroy failure; manual clean-up


#1

Hello!

I am unable to destroy a model and its machines. It contained a working cdk cluster, after running juju destroy-model it removed the applications but not the machines, and the containers are still running there.

I tried with remove-machine --force, tried rebooting the controller, nothing happens. I am on MaaS. If I release or delete the machines manually, how can I clean up the model in the controller?

Thanks

$ juju status
Model                         Controller             Cloud/Region  Version  SLA          Timestamp  Notes
conjure-canonical-kubern-194  conjure-up-dcmaas-d96  dcmaas        2.5.0    unsupported  17:00:01Z  attempt 8 to destroy model failed (will retry):  model not empty, found 8 machines (model not empty)

Machine  State    DNS            Inst id  Series  AZ       Message
0        stopped  10.66.100.130  n3acer   bionic  default  Deployed
1        stopped  10.66.100.131  xewaft   bionic  default  Deployed
4        stopped  10.66.100.134  n478qp   bionic  default  Deployed
5        stopped  10.66.100.136  qms8bh   bionic  default  Deployed
7        stopped  10.66.100.139  8agg4b   bionic  default  Deployed
8        stopped  10.66.100.135  ctwwtq   bionic  default  Deployed
9        stopped  10.66.100.137  tmffx6   bionic  default  Deployed
12       stopped  10.66.100.140  6mp4ss   bionic  default  Deployed

juju debug-log

[...]
unit-kubernetes-worker-2: 16:28:23 ERROR juju.worker.dependency "upgrader" manifold worker returned unexpected error: cannot set agent version: unit "kubernetes-worker/2" not found
unit-kubernetes-worker-0: 16:28:23 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: failed to initialize uniter for "unit-kubernetes-worker-0": unit "kubernetes-worker/0" not found
unit-kubernetes-worker-5: 16:28:24 ERROR juju.worker.dependency "upgrader" manifold worker returned unexpected error: cannot set agent version: unit "kubernetes-worker/5" not found
unit-kubernetes-worker-0: 16:28:24 ERROR juju.worker.dependency "upgrader" manifold worker returned unexpected error: cannot set agent version: unit "kubernetes-worker/0" not found
unit-kubernetes-worker-2: 16:28:24 ERROR juju.worker.dependency "meter-status" manifold worker returned unexpected error: unit "kubernetes-worker/2" not found
unit-kubernetes-worker-1: 16:28:24 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: failed to initialize uniter for "unit-kubernetes-worker-1": unit "kubernetes-worker/1" not found
unit-kubernetes-worker-2: 16:28:24 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: failed to initialize uniter for "unit-kubernetes-worker-2": unit "kubernetes-worker/2" not found

Model migration hangs on "starting"
#2

It seems its a common problem cleaning up modells.

There are a few people in a few situations getting problems cleaning up models including myself…

@jamesbeedy @anastasia-macmood @rick_h

I


#3

I agree that there is an issue. It is also very hard for us to figure out how you got yourself in this state. To be absolutely clear - we are not seeing this in our testing… Obviously, there is a gap.
To address the issue globally and to ensure that noone else gets into this position, we need a reproducible scenarios - exactly what you did, what commands you ran prior, what failures were in logs before, etc. It’s very possible, that a prior operation failed but you (and other users) were not aware that something did not succeed. In other words, most of people that tell us about this issue do not know specifically what led to these failures. They just see that failure occurred - model is undestructible.

If you have a repeatable scenario, I’d love to hear it.

Meanwhile, is there any way we could get a sanitized copy of your db? The best way to get it to us would via an attachment in bug report (report a bug). To get the actual dump, run this on the affected model:

JUJU_DEV_FEATURE_FLAGS=developer-mode juju dump-db


#4

Potentially, it could be linked to existing bugs if you feel like you are experiencing the same symptoms… For example, problems removing a machine - https://bugs.launchpad.net/juju/+bug/1818045


#5

Unfortunately it’s not repeatable since I didn’t keep scripts, nor did anything out of the ordinary AFAIK.

Thank you very much for the link. Since I have exactly the same behavior, I appended my dump to the same ticket.


#6

This will be fixed in 2.6 from what I hear. @rick_h can verify.