Generational Configuration

Introduction

Work is under-way to implement model generations via “branches”. Branches are a mechanism by which changes to charm URL, charm configuration and resources can be made, then applied selectively to application units by having them track the branch. When the changes are assessed and deemed appropriate, the branch is then “committed” to the model whereupon the changes are applied throughout, affecting all units as normal.

This post deals with charm configuration only. Charm URL and resources will be considered at a later date.

It is intended to work toward answering the following questions.

  1. In what form should generational configuration deltas be written to state?
  2. How should a unit determine (and request configuration for) a generational branch that it is tracking?
  3. How should the multiwatcher notify of changes to generational configuration in addition to base configuration?
  4. How should historical configuration changes be recorded when a branch is committed to the model?

In what form should generational configuration deltas be written to state?

Charm configuration is stored as deltas - only values set by the operator are stored in settings. When charm configuration is read, the charm configuration defaults are retrieved, then the deltas from settings are applied, before being returned to the user.

The first implementation, currently seen on the develop branch, is to use a new settings document suffixed with the branch identifier. This document is created from the existing settings for charm configuration upon the first read or write of settings for the branch/application. It is worth noting that this does not track changes made under the branch - it starts as a snapshot of all non-default values.

Other options might be:

  • Keep storing in settings, but only generational deltas, meaning that for branches, configuration is defaults + settings + branch deltas.
  • Store generational configuration in the generations collection against its branch.
  • Create a new collection for generational configuration altogether.

How should a unit determine a generational branch that it is tracking?

When a unit is set to track a branch, it obviously needs to be made aware of this fact so that it can read (and watch for changes to) its charm configuration.

On the develop branch, applications with configuration changes and any tracking units are recorded against the branch in the generations collection.

Under this scheme every unit would have to watch the generation collection and interrogate all “in flight” branches to see if it has been set to track any of them. Any triggered watch would result in the unit re-requesting and watching a new location for configuration.

One alternative might be record tracked branches against unit documents directly, so that this search overhead was ameliorated. If it was only recorded against the unit, interrogating a branch would require all unit documents to be read. If it was stored in both places, care would be required to ensure that the two sources of data are always consistent.

How should the multiwatcher notify of changes to generational configuration?

This remains an open question - no logic has been written to handle this.

At the same time as the generation feature is implemented, units will be changed to retrieve and watch configuration via the new model cache. The cache can be thought of as a watch multiplexer. It maintains a cache of all model entities whose types it is set to track (applications/units/machines etc) by using a multiwatcher, and entities themselves watch the cache for change notifications instead of using state-based watchers.

Wherever generational configuration is stored, the multiwatcher will need to also watch those documents in order to populate the model cache. This would tend to favour storing generational configuration in the settings collection, as a sightly modified version of the standard settings watcher would be easy to implement.

Units watching the model cache for charm configuration obviously need to supply any tracking branch as an argument.

How should historical configuration changes be recorded when a branch is committed?

As for the multiwatcher, no logic exists for this yet.

The obvious way of doing this is to ensure the configuration changes are in the generations collection against the branch at commit time. This would be implicit if we store generational configuration there all the time, or we could copy from another location upon commit.

The approach that I arrive at intuitively is:

  • Store generational configuration in a different settings document (as currently implemented).
  • Keep tracking units against the generations collection (current implementation), index appropriately and add a method that returns any tracked branch (possibly branches in future) for an input unit.
  • Add a watcher that reacts to changes against in-flight branches, including commits, so a unit knows when it reverts to the (master) model level configuration.
  • Massage the multiwatcher to react to changes against generational settings documents.
  • Upon commit, compare the generational settings document to the master, then write the deltas to the branch document in the generations collection.

Functionally the difference in snapshotting vs a delta will come into play if we allow the base to get updates. If we do then we have to be careful because having a branch get applied would revert the changes that didn’t end up in the snapshot (or we have to be careful that any change applied to the base also gets replayed into every branch that depends on it).

I suppose that a tradeoff is that if you have the data split, then you have more places where you have to watch for possible changes. So carefully applying the changes is the inverse of carefully watching for changes.

I would have hoped that units wouldn’t have to know what generation they were on, or that they have to watch a different document. That it would all be abstracted away and the Unit Agent would never have to know.

That’s a good point. We aught to be able in the model cache to hold unit branch tracking data and respond to configuration reads/watches transparently.

My thoughts on this.

settings

Storing the configuration changes is a big problem. I’m trying to think of a meaningful way forwards, but let me first indicate the problems I see.

If we take a snapshot of the configuration as a starting point, even if we don’t allow direct modification of settings but force a branch, another branch could get created, modified, and committed before the first changes are finished, and this could easily result in settings being overridden.

I then thought that the better solution for branch config would be to only store the values that have changed from master. However this then has a problem with how do we indicate removed attributes? This is where we remove an explicit setting of a param and go back to the defaults. We allow this through the CLI, but I’m not yet sure how to represent this in the model.

unit branch determination

My initial thought is that a unit should know which branch it is following. However on second thoughts, if we are using the model cache for all the watchers, then as long as the model cache has full information about the branches in flight, then it is trivial for the cache code to check through the branches tracking components.

multiwatcher

We would need to add a Branch delta type to the multi watcher. This would include everything relating to the branch.

  • identity, author etc
  • application settings changes
  • future application charm and resource changes

This would give the model cache all it needs to trigger the watchers and give the right information to the right unit.

history

I think this needs a new collection. I’d be tempted to call it commits since we are following along with the branch theme.

The record in the commit should also directly include the deltas of config. Another reason I think that it is important to store the branch config changes as true deltas rather than snapshots.

The actual process of getting the relevant information into the collection should be relatively straight forward.

We’d then need the API server methods, API client and CLI to query the history.

storing deltas for config

I’d be extremely tempted to just store the application configuration changes on the branch document itself. That way we have the ability to store the explicit removal of config fields, as well as the updates. It also makes the creation of the multiwatcher branch delta much easier. Along with moving the changes into the history collection.

Clearly the API server would need to know when setting application config as to the active branch. This could the switch the logic if we are not working on the master branch.

This also simplifies the potential creation of additional settings documents.

Tim and I talked about this a bit today, and a few things came out of it.

  • We both do prefer storing just the delta. We already have mechanisms for watching multiple documents (state.newDocWatcher takes a list of docKeys and will generate a notification event if any of them change.)
  • One interesting idea is to store not just “this is what I’m setting it to” but “this is what I’m changing it from”. That also implies storing more than just “this is the current snapshot”. The big win here is when inspecting history, you get a bit more context (it was changed to 5, from 10 is different than it changed to 5 from 1).
  • Storing ‘from’ also allows a future version to handle conflicts. For now, we can just go with ‘last write wins’. But imagine if we do have multiple branches changing concurrently, it is hard to even notice that there is a problem. (branch A sets foo=bar, branch B sets foo=bling, it would be nice to warn the user that there were concurrent edits to the same field when they go to commit.) To reiterate, for now we should go with last-write-wins and not make the UX terribly complex. But storing the data in a way that we can notice that there were concurrent updates is a good win for the future.
  • We both were thinking that it feels much better to have 2 different collections for “this is the list of active branches” from “this is the list of committed history”. They just have different lifecycles/use cases, and it feels better to keep them separate. (you probably need different indexing on them, may want to snapshots vs deltas, etc).

As came up during our stand-up…

There is a quirk of storing branch config as deltas from the master config, because any user-set master config is itself a delta from charm defaults.

The issue is around unsetting a value. If there is a user-set config item in master, the user changes it under a branch, and then unsets it, there is the challenge of determining whether unsetting is a reversion to the master value, or to the charm default.

My feeling is that unsetting should have a consistent meaning - reversion to charm default, and that reverting a branch change to the master value should explicitly be setting it back to that value.

Thoughts?