What to do with broken k8s applications?


#1

I’ve been experimenting with k8s charms for two days now and I have gathered a sizeable collection of broken applications that won’t destroy correctly. Is there anything I can do to force remove these applications?

Model      Controller     Cloud/Region  Version  SLA          Timestamp
k8s-test2  vmware-main-2  testcluster   2.5.4    unsupported  14:41:32+02:00

App          Version  Status   Scale  Charm              Store       Rev  OS          Address         Notes
con3                  blocked    0/1  sse-consumer       local         1  kubernetes                  Please add relationship to sse endpoint.
con4                  active     0/1  sse-consumer       local         2  kubernetes  10.152.183.118  
ep2                   waiting    0/1  sse-endpoint       local         2  kubernetes                  waiting for container
ep4                   waiting    0/1  sse-endpoint       local         4  kubernetes                  waiting for container
ep5                   blocked    0/1  sse-endpoint-mock  local         0  kubernetes                  active (ep4.example.com)
mariadb-k8s           active     0/1  mariadb-k8s        jujucharms    0  kubernetes  10.152.183.166  
mdb2                  active     0/1  mariadb-k8s        jujucharms    0  kubernetes  10.152.183.126  
mdb3                  active     0/1  mariadb-k8s        jujucharms    0  kubernetes  10.152.183.172  

Unit            Workload  Agent   Address      Ports     Message
con3/0*         blocked   failed                         Please add relationship to sse endpoint.
con4/0*         active    failed  10.1.38.212  8080/TCP  
ep2/1*          waiting   failed                         waiting for container
ep4/0*          waiting   failed                         waiting for container
ep5/0*          blocked   failed                         active (ep4.example.com)
mariadb-k8s/0*  active    failed  10.1.38.195  3306/TCP  
mdb2/0*         active    failed  10.1.38.197  3306/TCP  
mdb3/0*         active    failed  10.1.38.199  3306/TCP  
model:
  name: k8s-test2
  type: caas
  controller: vmware-main-2
  cloud: testcluster
  version: 2.5.4
  model-status:
    current: available
    since: 17 Apr 2019 17:30:54+02:00
  sla: unsupported
machines: {}
applications:
  con3:
    charm: local:kubernetes/sse-consumer-1
    series: kubernetes
    os: kubernetes
    charm-origin: local
    charm-name: sse-consumer
    charm-rev: 1
    scale: 1
    exposed: false
    life: dying
    application-status:
      current: blocked
      message: Please add relationship to sse endpoint.
      since: 18 Apr 2019 11:17:59+02:00
    units:
      con3/0:
        workload-status:
          current: blocked
          message: Please add relationship to sse endpoint.
          since: 18 Apr 2019 11:17:59+02:00
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 11:19:18+02:00
        leader: true
    endpoint-bindings:
      sse-endpoint: ""
  con4:
    charm: local:kubernetes/sse-consumer-2
    series: kubernetes
    os: kubernetes
    charm-origin: local
    charm-name: sse-consumer
    charm-rev: 2
    scale: 1
    provider-id: 4c2435c2-61bc-11e9-bd63-0050562c47e7
    address: 10.152.183.118
    exposed: false
    life: dying
    application-status:
      current: active
      since: 18 Apr 2019 11:28:21+02:00
    relations:
      sse-endpoint:
      - ep4
    units:
      con4/0:
        workload-status:
          current: active
          since: 18 Apr 2019 11:28:21+02:00
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 14:31:56+02:00
        leader: true
        open-ports:
        - 8080/TCP
        address: 10.1.38.212
        provider-id: 4c1f05da-61bc-11e9-bd63-0050562c47e7
    endpoint-bindings:
      sse-endpoint: ""
  ep2:
    charm: local:kubernetes/sse-endpoint-2
    series: kubernetes
    os: kubernetes
    charm-origin: local
    charm-name: sse-endpoint
    charm-rev: 2
    scale: 1
    exposed: false
    life: dying
    application-status:
      current: waiting
      message: waiting for container
    units:
      ep2/1:
        workload-status:
          current: waiting
          message: waiting for container
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 11:18:43+02:00
        leader: true
    endpoint-bindings:
      sse-endpoint: ""
  ep4:
    charm: local:kubernetes/sse-endpoint-4
    series: kubernetes
    os: kubernetes
    charm-origin: local
    charm-name: sse-endpoint
    charm-rev: 4
    scale: 1
    exposed: false
    life: dying
    application-status:
      current: waiting
      message: waiting for container
    relations:
      sse-endpoint:
      - con4
    units:
      ep4/0:
        workload-status:
          current: waiting
          message: waiting for container
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 14:29:28+02:00
        leader: true
    endpoint-bindings:
      sse-endpoint: ""
  ep5:
    charm: local:kubernetes/sse-endpoint-mock-0
    series: kubernetes
    os: kubernetes
    charm-origin: local
    charm-name: sse-endpoint-mock
    charm-rev: 0
    scale: 1
    exposed: false
    life: dying
    application-status:
      current: blocked
      message: active (ep4.example.com)
      since: 18 Apr 2019 14:21:41+02:00
    units:
      ep5/0:
        workload-status:
          current: blocked
          message: active (ep4.example.com)
          since: 18 Apr 2019 14:21:41+02:00
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 14:28:23+02:00
        leader: true
    endpoint-bindings:
      sse-endpoint: ""
  mariadb-k8s:
    charm: cs:~juju/mariadb-k8s-0
    series: kubernetes
    os: kubernetes
    charm-origin: jujucharms
    charm-name: mariadb-k8s
    charm-rev: 0
    charm-version: 01ba1d0
    scale: 1
    provider-id: 39b356f2-612e-11e9-bd63-0050562c47e7
    address: 10.152.183.166
    exposed: false
    life: dying
    application-status:
      current: active
      since: 17 Apr 2019 18:31:37+02:00
    units:
      mariadb-k8s/0:
        workload-status:
          current: active
          since: 17 Apr 2019 18:31:37+02:00
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 10:55:45+02:00
        leader: true
        open-ports:
        - 3306/TCP
        address: 10.1.38.195
        provider-id: mariadb-k8s-0
    endpoint-bindings:
      server: ""
  mdb2:
    charm: cs:~juju/mariadb-k8s-0
    series: kubernetes
    os: kubernetes
    charm-origin: jujucharms
    charm-name: mariadb-k8s
    charm-rev: 0
    charm-version: 01ba1d0
    scale: 1
    provider-id: 58d0f0ae-612e-11e9-bd63-0050562c47e7
    address: 10.152.183.126
    exposed: false
    life: dying
    application-status:
      current: active
      since: 17 Apr 2019 18:32:14+02:00
    units:
      mdb2/0:
        workload-status:
          current: active
          since: 17 Apr 2019 18:32:14+02:00
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 10:57:04+02:00
        leader: true
        open-ports:
        - 3306/TCP
        address: 10.1.38.197
        provider-id: mdb2-0
    endpoint-bindings:
      server: ""
  mdb3:
    charm: cs:~juju/mariadb-k8s-0
    series: kubernetes
    os: kubernetes
    charm-origin: jujucharms
    charm-name: mariadb-k8s
    charm-rev: 0
    charm-version: 01ba1d0
    scale: 1
    provider-id: 6eb88beb-612e-11e9-bd63-0050562c47e7
    address: 10.152.183.172
    exposed: false
    life: dying
    application-status:
      current: active
      since: 17 Apr 2019 18:32:50+02:00
    units:
      mdb3/0:
        workload-status:
          current: active
          since: 17 Apr 2019 18:32:50+02:00
        juju-status:
          current: failed
          message: resolver loop error
          since: 18 Apr 2019 10:56:38+02:00
        leader: true
        open-ports:
        - 3306/TCP
        address: 10.1.38.199
        provider-id: mdb3-0
    endpoint-bindings:
      server: ""
storage:
  storage:
    database/0:
      kind: filesystem
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 19:32:38+02:00
      persistent: false
      attachments:
        units:
          mariadb-k8s/0:
            life: dying
    database/1:
      kind: filesystem
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:32:10+02:00
      persistent: false
      attachments:
        units:
          mdb2/0:
            life: dying
    database/2:
      kind: filesystem
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:32:47+02:00
      persistent: false
      attachments:
        units:
          mdb3/0:
            life: dying
  filesystems:
    "0":
      provider-id: 39bd3ad9-612e-11e9-bd63-0050562c47e7
      volume: "0"
      storage: database/0
      attachments:
        containers:
          mariadb-k8s/0:
            mount-point: /var/lib/mysql
            read-only: false
            life: alive
        units:
          mariadb-k8s/0:
            life: dying
      pool: kubernetes
      size: 35
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 19:32:38+02:00
    "1":
      provider-id: 58cf1104-612e-11e9-bd63-0050562c47e7
      volume: "1"
      storage: database/1
      attachments:
        containers:
          mdb2/0:
            mount-point: /var/lib/mysql
            read-only: false
            life: alive
        units:
          mdb2/0:
            life: dying
      pool: k8s-ceph
      size: 28
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:32:10+02:00
    "2":
      provider-id: 6eb46f5a-612e-11e9-bd63-0050562c47e7
      volume: "2"
      storage: database/2
      attachments:
        containers:
          mdb3/0:
            mount-point: /var/lib/mysql
            read-only: false
            life: alive
        units:
          mdb3/0:
            life: dying
      pool: k8s-ceph
      size: 28
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:32:47+02:00
  volumes:
    "0":
      provider-id: pvc-39bd3ad9-612e-11e9-bd63-0050562c47e7
      storage: database/0
      attachments:
        containers:
          mariadb-k8s/0:
            read-only: false
            life: alive
        units:
          mariadb-k8s/0:
            life: dying
      pool: kubernetes
      size: 856
      persistent: false
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:31:35+02:00
    "1":
      provider-id: vol2
      storage: database/1
      attachments:
        containers:
          mdb2/0:
            read-only: false
            life: alive
        units:
          mdb2/0:
            life: dying
      pool: k8s-ceph
      size: 416
      persistent: false
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:32:10+02:00
    "2":
      provider-id: vol1
      storage: database/2
      attachments:
        containers:
          mdb3/0:
            read-only: false
            life: alive
        units:
          mdb3/0:
            life: dying
      pool: k8s-ceph
      size: 416
      persistent: false
      life: alive
      status:
        current: attached
        since: 17 Apr 2019 18:32:47+02:00
controller:
  timestamp: 14:37:56+02:00

As an example; the logs of the mdb2 application:

application-mdb2: 18:32:02 INFO juju.cmd running jujud [2.5.4 gc go1.11.6]
application-mdb2: 18:32:02 DEBUG juju.cmd   args: []string{"./jujud", "caasoperator", "--debug", "--application-name", "mdb2"}
application-mdb2: 18:32:02 DEBUG juju.agent read agent config, format "2.0"
application-mdb2: 18:32:02 INFO juju.worker.upgradesteps upgrade steps for 2.5.4 have already been run.
application-mdb2: 18:32:02 INFO juju.cmd.jujud caas operator application-mdb2 start (2.5.4 [gc])
application-mdb2: 18:32:02 DEBUG juju.worker.dependency "agent" manifold worker started
application-mdb2: 18:32:02 DEBUG juju.worker.dependency "api-config-watcher" manifold worker started
application-mdb2: 18:32:02 DEBUG juju.worker.dependency "clock" manifold worker started
application-mdb2: 18:32:02 DEBUG juju.worker.apicaller connecting with old password
application-mdb2: 18:32:02 DEBUG juju.worker.dependency "upgrade-steps-gate" manifold worker started
application-mdb2: 18:32:02 DEBUG juju.worker.introspection introspection worker listening on "@jujud-application-mdb2"
application-mdb2: 18:32:02 DEBUG juju.worker.introspection stats worker now serving
application-mdb2: 18:32:02 DEBUG juju.api successfully dialed "wss://10.10.138.187:17070/model/8fa45ec2-777c-468f-8336-baf2b7d74495/api"
application-mdb2: 18:32:02 INFO juju.api connection established to "wss://10.10.138.187:17070/model/8fa45ec2-777c-468f-8336-baf2b7d74495/api"
application-mdb2: 18:32:02 DEBUG juju.worker.dependency "upgrade-steps-flag" manifold worker started
application-mdb2: 18:32:02 DEBUG juju.worker.dependency "migration-fortress" manifold worker started
application-mdb2: 18:32:03 INFO juju.worker.apicaller [8fa45e] "application-mdb2" successfully connected to "10.10.138.187:17070"
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "api-caller" manifold worker started
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "log-sender" manifold worker started
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "migration-minion" manifold worker started
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "migration-inactive-flag" manifold worker started
application-mdb2: 18:32:03 INFO juju.worker.migrationminion migration phase is now: NONE
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "charm-dir" manifold worker started
application-mdb2: 18:32:03 DEBUG juju.worker.logger initial log config: "<root>=DEBUG"
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "logging-config-updater" manifold worker started
application-mdb2: 18:32:03 DEBUG juju.worker.logger logger setup
application-mdb2: 18:32:03 DEBUG juju.worker.dependency "api-address-updater" manifold worker started
application-mdb2: 18:32:03 DEBUG juju.worker.logger reconfiguring logging from "<root>=DEBUG" to "<root>=INFO;juju.apiserver=INFO;juju.provider.vmware=TRACE;juju.provisioner=TRACE;juju.state=INFO;juju.state.cloudimagemetadata=TRACE;unit=DEBUG"
application-mdb2: 18:32:03 INFO juju.worker.uniter.charm downloading cs:~juju/mariadb-k8s-0 from API server
application-mdb2: 18:32:03 INFO juju.downloader downloading from cs:~juju/mariadb-k8s-0
application-mdb2: 18:32:03 INFO juju.downloader download complete ("cs:~juju/mariadb-k8s-0")
application-mdb2: 18:32:03 INFO juju.downloader download verified ("cs:~juju/mariadb-k8s-0")
application-mdb2: 18:32:06 INFO juju.worker.caasoperator operator "mdb2" started
application-mdb2: 18:32:06 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-mdb2-0
application-mdb2: 18:32:06 INFO juju.worker.leadership mdb2/0 promoted to leadership of mdb2
application-mdb2: 18:32:06 INFO juju.worker.uniter unit "mdb2/0" started
application-mdb2: 18:32:06 INFO juju.worker.uniter hooks are retried true
application-mdb2: 18:32:07 INFO juju.worker.uniter found queued "start" hook
application-mdb2: 18:32:07 INFO unit.mdb2/0.juju-log Reactive main running for hook start
application-mdb2: 18:32:07 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/docker_resource.py:7:auto_fetch
application-mdb2: 18:32:08 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:16:fetch_image
application-mdb2: 18:32:08 INFO unit.mdb2/0.juju-log Invoking reactive handler: ../../application-mdb2/charm/hooks/relations/mysql/provides.py:25:_handle_broken:server
application-mdb2: 18:32:08 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/docker_resource.py:18:fetch
application-mdb2: 18:32:08 INFO unit.mdb2/0.juju-log status-set: maintenance: fetching resource: mysql_image
application-mdb2: 18:32:09 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:26:config_mariadb
application-mdb2: 18:32:09 INFO unit.mdb2/0.juju-log status-set: maintenance: Configuring mysql container
application-mdb2: 18:32:09 INFO unit.mdb2/0.juju-log status-set failed: maintenance Configuring mysql container
application-mdb2: 18:32:09 INFO unit.mdb2/0.juju-log set pod spec:
application-mdb2: 18:32:09 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:21:mariadb_active
application-mdb2: 18:32:09 INFO unit.mdb2/0.juju-log status-set failed: active 
application-mdb2: 18:32:10 INFO unit.mdb2/0.juju-log status-set: active: 
application-mdb2: 18:32:10 INFO juju.worker.uniter.operation ran "start" hook
application-mdb2: 18:32:10 INFO juju.worker.uniter found queued "leader-elected" hook
application-mdb2: 18:32:10 INFO unit.mdb2/0.juju-log Reactive main running for hook leader-elected
application-mdb2: 18:32:10 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:21:mariadb_active
application-mdb2: 18:32:11 INFO unit.mdb2/0.juju-log status-set failed: active 
application-mdb2: 18:32:11 INFO unit.mdb2/0.juju-log Invoking reactive handler: ../../application-mdb2/charm/hooks/relations/mysql/provides.py:25:_handle_broken:server
application-mdb2: 18:32:11 INFO unit.mdb2/0.juju-log status-set: active: 
application-mdb2: 18:32:11 INFO juju.worker.uniter.operation ran "leader-elected" hook
application-mdb2: 18:32:11 INFO unit.mdb2/0.juju-log Reactive main running for hook database-storage-attached
application-mdb2: 18:32:12 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:21:mariadb_active
application-mdb2: 18:32:12 INFO unit.mdb2/0.juju-log status-set failed: active 
application-mdb2: 18:32:12 INFO unit.mdb2/0.juju-log Invoking reactive handler: ../../application-mdb2/charm/hooks/relations/mysql/provides.py:25:_handle_broken:server
application-mdb2: 18:32:12 INFO unit.mdb2/0.juju-log status-set: active: 
application-mdb2: 18:32:12 INFO juju.worker.uniter.operation ran "database-storage-attached" hook
application-mdb2: 18:32:13 INFO unit.mdb2/0.juju-log Reactive main running for hook config-changed
application-mdb2: 18:32:13 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:21:mariadb_active
application-mdb2: 18:32:13 INFO unit.mdb2/0.juju-log status-set failed: active 
application-mdb2: 18:32:13 INFO unit.mdb2/0.juju-log Invoking reactive handler: ../../application-mdb2/charm/hooks/relations/mysql/provides.py:25:_handle_broken:server
application-mdb2: 18:32:13 INFO unit.mdb2/0.juju-log status-set: active: 



...  (more of the same config-changed and update-status hook stuff)

application-mdb2: 10:46:52 INFO juju.worker.uniter.operation ran "update-status" hook
application-mdb2: 10:52:48 INFO unit.mdb2/0.juju-log Reactive main running for hook update-status
application-mdb2: 10:52:48 INFO unit.mdb2/0.juju-log Invoking reactive handler: reactive/mysql.py:21:mariadb_active
application-mdb2: 10:52:48 INFO unit.mdb2/0.juju-log status-set failed: active 
application-mdb2: 10:52:48 INFO unit.mdb2/0.juju-log Invoking reactive handler: ../../application-mdb2/charm/hooks/relations/mysql/provides.py:25:_handle_broken:server
application-mdb2: 10:52:48 INFO unit.mdb2/0.juju-log status-set: active: 
application-mdb2: 10:52:48 INFO juju.worker.uniter.operation ran "update-status" hook
application-mdb2: 10:57:04 ERROR juju.worker.uniter resolver loop error: preparing operation "run update-status hook": writing state: open /var/lib/juju/agents/unit-mdb2-0/state/juju531775814: no such file or directory
application-mdb2: 10:57:04 INFO juju.worker.uniter unit "mdb2/0" shutting down: preparing operation "run update-status hook": writing state: open /var/lib/juju/agents/unit-mdb2-0/state/juju531775814: no such file or directory

#2

If possible, setting: juju model-config -m controller "logging-config=<root>=INFO;juju.state.txn=TRACE"

that should log the specific transactions we are trying to do to update the model, and what is failing.


#3

Thanks, I’ll enable that and report back when I have the next failure.


#4

In the upcoming 2.6 beta2 release (due out next week), we are introducing initial support for a --force option when running remove-application and remove-unit. The intent is that wedged units will be able to be forcibly removed, and in turn unblock the removal of the parent application and even model. The feature needs lots of testing because it’s hard to cover all corner cases in the lab. Sadly you won’t be able to easily upgrade your 2.5.4 k8s model to 2.6 beta2 though so right now the only real option is manual db surgery to removed the stuck entities.