Brainstorm - how do we make `juju status` faster?

Let’s use this thread to explore options for speeding up juju status.

Here’s my understanding of the problem: When deployments have many models, say over 100, under control, juju status can take multiple seconds to return. This problem is especially bad when the underlying MongoDB instance is under I/O heavy load.

As a starting suggestion, what about a --stale-okay flag? Hear me out …

If people are just calling juju status to check if the system is alive, perhaps we don’t need to check the state of every running instance? Perhaps we’re mainly interested in the controller’s health. juju status --stale-okay would returned a canned response saved as a file on disk (similar to the juju controllers command) but might ping the controller via ps (or other some other command, perhaps we hit the API server with a request).

juju status --stale-okay could additionally trigger a new canned response to be generated asynchronously. That way the stale info would never be too stale.

If you just want to check if the controllers health, then why not just have juju ping (perhaps health), which would just return if the controllers are up. Additionally there could be a response time, but that could just end up measuring the network, so I think we shouldn’t!

The fundamental question we should be really asking is; what are the use cases for status and why do people want it so quickly? If we bottom that out, I think that will give us a better idea of how to change status (if at all)…

I don’t follow this. The vast majority of models aren’t the controller model and if I were looking for active controller health I’d run status on the controller model.

Most folks running status are looking for hook progress/live unit details, ip addresses, etc and I don’t know the value of a stale set of data.

Maybe I’m misunderstanding the issue/solution here so feel free to correct me.

How can we confirm this? Like @simonrichardson mentions, gathering some solid use cases / user feedback sounds extremely helpful.

It sounds like shortcuts won’t work and this will need to stay thought of as a Difficult Problem™. But, if we’re wrong and people are just asking Juju “Are you alive?”, then perhaps we can route around the problem of status being too slow.

So to be honest we’re just doing a lot of “what are you doing” and the answer is going to be “it depends” from installing, testing, maintenance, etc.

I’m saying this based on using Juju and working with folks using Juju over the years that honestly folks don’t status their models much unless they’re doing something, be it config updating, charm upgrading, doing a deploy/etc. Each of those will tend to be validating that moving parts are moving correctly and waiting to check for things to occur (like a charm upgrade to complete and get back to ready before running the config change/juju action).