Pending hooks counter and way to introspect it in juju status

davecore · 9 April 2019 16:23

I was working on a simple fix for LP bug keystone upgrade-charm not providing useful status updates. The fix involved adding some status_set() and log() entries in the keystone charm to give a better idea of the current status in juju status.

This then became a wider discussion with the original requestor and the OpenStack charm engineering team. We came to the conclusion that Juju should be keeping track of outstanding hooks to process for each unit, and display that information as part of juju status.

The purpose would be to give more visibility in Juju as to the progress of an action (e.g. the upgrade-charm hook of the keystone charm can take up to 15-30 minutes to complete the identity_changed, admin_relation_changed and identity_credentials_changed hooks for all related services: a sort of counter of how many of these were triggered by the upgrade-charm with a status like ‘waiting for identity-relation-changed hooks to complete after upgrade - completed X of Y’).

Any thoughts? Thanks!

rick_h · 9 April 2019 20:07

Thanks, this is an interesting idea. I don’t think that juju status is the right place for it as there’s limitations in room and usefulness of that information for users not in the middle of debugging things like this. I could see it as some sort of show-unit detailed view that might show something of things the system knows the unit has to be done yet. One thing I’d be curious to check is if there’s anywhere we’d do things like collapse duplicate hooks in the queue. I don’t recall us doing that but figuring out the stability of that list of todo would be important as part of something like this.

rick_h · 9 April 2019 20:07

I’d encourage you to file a wishlist bug for some additional work here as it is an interesting idea that addresses a lot of transparency and “what’s it doing” when you’re not following what’s going on.

afreiberger · 9 April 2019 22:13

I actually instigated this discussion by way of working through openstack charm upgrades and wondering what in the world Keystone was doing in an “executing” state without any ($hookname) called out as to what it was working on.

I think that there are some hooks that display, but it seems that reactionary hooks from relation-changed actions don’t get displayed in status, but just show as “executing” and unit is in ready status, even though it’s doing a lot of background update work an operator may want to wait for before moving on with additional charm upgrades.

I noticed this happening on a juju model upgrade just now, that the large number of nrpe relations to nagios cause nagios to hit a lot of monitor-relation-changed hooks that aren’t reflected in the juju status display, as they (I’m suspecting) weren’t operator initiated.

Could we display these reaction-triggered hooks in a --extended status display?

Here’s my nagios example:
nagios/0* active executing 33 10.55.65.148 80/tcp ready
ubuntu@nagios/0:~$ pstree -pals 23725
systemd,1 --system --deserialize 24
└─bash,9425 /lib/systemd/system/jujud-unit-nagios-0/exec-start.sh
└─jujud,9430 unit --data-dir /var/lib/juju --unit-name nagios/0 --debug
└─monitors-relati,23725 /var/lib/juju/agents/unit-nagios-0/charm/hooks/monitors-relation-changed

I’d like juju status to have a flag that would display the (monitors-relation-changed;) as perhaps an --extended or --verbose flag.

afreiberger · 9 April 2019 22:19

I also very much agree with the idea of a show-unit introspection level command for showing pending task queues. Would determining the queued hooks be available with the introspection tools available today?

stub · 10 April 2019 01:02

Yes, we have needed this for a long time. juju-wait needs to use heuristics and is this unreliable, because knowledge about which units have hooks pending has never been exposed. Bug #1200267 “Expose when stable state is reached” : Bugs : juju-core was one of the original bugs, but unfortunately became a duplicate and dissolved when an unrelated feature landed. I’ve removed the incorrect duplicate it and set it to wishlist. This might be enough for the OpenStack charm use cases, or maybe they need something more involved.

thumper · 10 April 2019 03:45

Part of the problem here is that the controller doesn’t know which hooks a particular unit needs to run. Those are all determined locally, and iteratively. There is no “queue” that is built up, but rather any particular iteration through the loop causes the resolver to look at the current state compared with the expected state and then works through an ordered list of potential hooks to run. The prioritisation is a defined order, but there is no “queue”.

Trying to represent this I think would be quite complicated.

jameinel · 10 April 2019 06:15

I did look into it at one point, and it seems like the resolve loop could choose to build a priority list of what to work on, and could record that back with the controller.

That said, it sounds like at least some of the proposed “what is going on” is python hooks, not the Juju initiated scripts in the hooks/ directory (identity_changed, etc).
Juju doesn’t have visibility at the python layer, so that might be part of charm helpers, but wouldn’t be part of Juju proper. And if charmhelpers gives you more details, it seems more useful universally.

afreiberger · 10 April 2019 23:32

I think that this call to monitor-srelation-changed looks like an actual hook in the hooks directory being called, and not a python hook. If this is the case, this monitors-relation-changed should be reflected in what is “executing” on the juju status just as if it were an (update-status). even just exposing relation hooks in the interface in what’s executing would go a long way to visibility, if available at the controller level.