Can't remove model, last application "waiting for machine"


#1

Hi all

I have a model that’s been trying to destroy itself for a few days now. It hangs on “model not empty” because one application is in the state “waiting for machine”.

Is there a way to force-destroy this model?

model:
  name: cot-dev2
  type: iaas
  controller: vmware-main
  cloud: vmware1
  region: ILABT
  version: 2.3.3
  model-status:
    current: destroying
    message: 'attempt 1 to destroy model failed (will retry):  model not empty, found
      1 application (model not empty)'
    since: 19 Sep 2018 14:18:20+02:00
  meter-status:
    color: amber
    message: user verification pending
  sla: unsupported
machines: {}
applications:
  nginx-api-gateway:
    charm: local:xenial/nginx-api-gateway-0
    series: xenial
    os: ubuntu
    charm-origin: local
    charm-name: nginx-api-gateway
    charm-rev: 0
    exposed: false
    life: dying
    application-status:
      current: waiting
      message: waiting for machine
      since: 18 Feb 2018 21:56:24+01:00
    endpoint-bindings:
      upstream: ""
      website: ""
controller:
  timestamp: 10:50:08+02:00

#2

We have a bunch of models that are in this state. This might actually be the reason for the 40Mb/s traffic


#3

It could well be.

For quite some time we have thought about having a destroy-model --force for those situations where things get stuck when they shouldn’t.

One of the underlying bugs here I think is that the application destruction shouldn’t be blocked by the waiting for machine in this situation (and probably a few others).


#4

This seems a lot like lp:1654928, which claims to be fixed. So maybe this is a different issue.

I believe we’ve talked about having a “remove-application --force” of some sort. That said, I think we can just fix this one. If an application has a unit that has no machine available for it, we should just allow that unit to be removed and the application killed.


#5

The reason we want to remove some of these models is exactly because of this “stuck” application. We hoped removing the model would fix it. So remove-application --force would actually help us even more, since it means we don’t have to remove the model anymore.

Is there a workaround I can use to remove these applications and models currently, while waiting for the fix? I’d like to see if that reduces the network traffic.


#6

I think it resembles this bug report more: https://bugs.launchpad.net/juju/+bug/1724673

All the “stuck” applications have had or provided cross-model relations. Below is a yaml status of another model that fails to destroy. Here you can still see some of the remnants of cross-model relations.

juju status --format yaml
model:
  name: providenceplus
  type: iaas
  controller: vmware-main
  cloud: vmware1
  region: ILABT
  version: 2.3.8
  model-status:
    current: destroying
    message: 'attempt 39 to destroy model failed (will retry):  model not empty, found
      2 applications (model not empty)'
    since: 20 Sep 2018 10:31:00+02:00
  meter-status:
    color: amber
    message: user verification pending
  sla: unsupported
machines: {}
applications:
  kafka-rest:
    charm: local:xenial/kafka-rest-confluent-k8s-3
    series: xenial
    os: ubuntu
    charm-origin: local
    charm-name: kafka-rest-confluent-k8s
    charm-rev: 3
    exposed: false
    life: dying
    application-status:
      current: waiting
      message: waiting for machine
      since: 20 Jun 2018 09:34:30+02:00
    relations:
      kubernetes:
      - leggo
    endpoint-bindings:
      kafka: ""
      kubernetes: ""
      upstream: ""
  kafka-rest-k8s:
    charm: local:xenial/kafka-rest-confluent-k8s-0
    series: xenial
    os: ubuntu
    charm-origin: local
    charm-name: kafka-rest-confluent-k8s
    charm-rev: 0
    exposed: false
    life: dying
    application-status:
      current: waiting
      message: waiting for machine
      since: 05 Jun 2018 15:51:19+02:00
    relations:
      kubernetes:
      - deve
    endpoint-bindings:
      kafka: ""
      kubernetes: ""
      upstream: ""
application-endpoints:
  deve:
    url: vmware-main:sborny/sborny-tutorial.deve
    endpoints:
      kubernetes-deployer:
        interface: kubernetes-deployer
        role: provider
    life: dying
    application-status:
      current: error
      message: 'cannot get discharge from "https://10.10.139.74:17070/offeraccess":
        cannot acquire discharge: cannot http POST to "https://10.10.139.74:17070/offeraccess/discharge":
        Post https://10.10.139.74:17070/offeraccess/discharge: net/http: TLS handshake
        timeout'
      since: 10 Aug 2018 07:05:59+02:00
    relations:
      kubernetes-deployer:
      - kafka-rest-k8s
  leggo:
    url: vmware-main:sborny/sborny-tutorial.leggo
    endpoints:
      kubernetes-deployer:
        interface: kubernetes-deployer
        role: provider
    life: dying
    application-status:
      current: active
      message: Ready
      since: 11 Sep 2018 12:44:30+02:00
    relations:
      kubernetes-deployer:
      - kafka-rest
controller:
  timestamp: 10:54:24+02:00

#7

I have a sleu of models in this state across my 3 controllers and experiencing this in 20+ of my JAAS models. @uros-jovanovic is looking into my JAAS models, possibly he has some input here.


#8

Did anybody find any resolution to this? Am getting the same thing for models with cross model relations. Exactly the same status output as @merlijn-sebrechts


#9

Same also here with stale cross model relation


#10

Whether due to a stale cross model relation, or a unit currently in a hook error state, or a cloud API error, or a number of other reasons, removing applications can become stuck if the “do everything properly” workflow is not possible. We are this cycle going to address the issue by:

  • remove-application --force
  • remove-unit --force
  • destroy-model --continue-on-error

Unfortunately the fix is still in progress so there’s nothing easy that can be done right now to solve the issue.


#12

Awesome!

How will this work with existing models that can’t upgrade to a newer version (because they’re destroying or in an error state). Will we still be able to use these commands to remove those models?


#13

Once the controller and models are upgraded, the commands will work with existing models to get them cleaned up.


#14

Just checked and I’m still able to upgrade the broken models, so this should be fine. Thanks!