Problem with juju upgrade

I am trying to upgrade a model to Juju 2.7.1 and it fails on some units with “ERROR some agents have not upgraded to the current model version 2.7.0” and a list of those units. The problem is that I cannot trigger the update somehow, even with an agent restart.

The same also applies to upgrade-charm in case a unit agent for some reason does not work correctly (we have noticed that especially when there are spikes on the juju controllers load, many agents do not seem to communicate correctly with juju server, although the status command show everything working OK)

We do try to make sure the existing model has been fully upgraded to the current version before you start another upgrade.

One thing to check is to do a juju status --format yaml and look for the agent versions to see which agent may not have been upgraded.

Yes, I can confirm that there are many version still in 2.6.9 instead of 2.7.0. What I have not find a way to do is to trigger the upgrade for those agent so as to bring the model in a fully upgrade state.

Is it machines or units that aren’t upgrading? Is it the controller or another model?

For the machines that aren’t upgrading, do the machines still exist? Are the agents down?

It is agents which are up’n’running. I have restarted all of them with no luck.

Hmm… that is very strange.

Probably worth noting that the machines need to have upgraded, before any agents on that machine will upgrade.

Let’s pick a machine agent that is on 2.6.9, and lets look in the machine log file on the machine, and we are looking for the “upgrader” logs. Does anything jump out?

Hmmm an interesting finding is the case that some units have upgraded and other not, the agent is dead, so with a restart all is OK. However there are other cases where no agent is upgraded. A sample log from one unit is the following:

2020-01-20 13:43:17 INFO juju.worker.upgrader upgrader.go:155 abort check blocked until version event received
2020-01-20 13:43:17 INFO juju.worker.upgrader upgrader.go:161 unblocking abort check
2020-01-20 13:43:17 INFO juju.worker.upgrader upgrader.go:194 desired agent binary version: 2.6.9
2020-01-20 19:02:37 INFO juju.worker.upgrader upgrader.go:155 abort check blocked until version event received
2020-01-20 19:02:37 INFO juju.worker.upgrader upgrader.go:161 unblocking abort check
2020-01-20 19:02:37 INFO juju.worker.upgrader upgrader.go:194 desired agent binary version: 2.6.9

So it seems that I have to somehow send the “version event”

Is that from a unit agent, or a machine agent?