Better control for unit placement into availability zones


#1

There are 2 LPs currently open for Juju related to AZs.

https://bugs.launchpad.net/juju/+bug/1743106
https://bugs.launchpad.net/juju/+bug/1777487

I just wanted to start a discussion around those two because right now the control over where units will land is fairly rudimentary.

For example on a large MAAS environment with several availability zones configured one might want to restrict a given model to only a subset of availability zones.

Another problem in some environments is fair distribution of units to availability zones: for example for Ceph deployed with a certain replication factor it is important to have an amount of nodes in each AZ divisible by that replication factor to maintain failure domains.

Adding tag constraints mimicking something like availability-zones model-config (not implemented yet) is rigid because you cannot change that constraint afterwards and add more AZs.

While it is possible to do something like juju deploy cs:xenial/ceph-osd --to az1 via CLI, there is no such placement directive like zone:<azname> for bundles, therefore, one cannot say “9 units, 3 into every AZ”.

From what I understand Juju will currently retry to deploy into a different AZ if it cannot find anything matching in an AZ it selected for constraint matching.

https://github.com/juju/juju/blob/juju-2.4.2/provider/common/availabilityzones.go#L150-L180
https://github.com/juju/juju/blob/juju-2.4.2/provider/common/availabilityzones.go#L80-L146

I think MAAS deployments are the ones that need more control over AZ placement.

What we currently do is:

machines:
  '0':
  '1':
  '2':
  '3':
  '4':
  '5':
  '6':
  '7':
  '8':
  '9':
  '10':
  '11':
  '12':
  '13':
applications:
  nvme-ceph-osd:
    charm: cs:xenial/ceph-osd
    num_units: 15
    bindings:
      # ...
    options:
      # ...
      customize-failure-domain: *customize-failure-domain
    to:
      - '0'
      - '1'
      - '2'
      - '3'
      - '4'
      - '5'
      - '6'
      - '7'
      - '8'
      - '9'
      - '10'
      - '11'
      - '12'
      - '13'
      - '14'
      - '15'
      - '16'
      - '17'

But there is no way to enforce that, for example, the distribution will be 5 units per AZ.

  nvme-ceph-mon:
    charm: cs:xenial/ceph-mon
    num_units: 3
    constraints: *oam-space-constr
    bindings:
      # ...
    options:
      expected-osd-count: *nvme-expected-osd-count
      monitor-count: *nvme-expected-mon-count
      customize-failure-domain: *customize-failure-domain
    to:
      - lxd:15
      - lxd:16
      - lxd:17

Likewise, there is no way to make sure that 3 ceph-mon units will be placed into 3 different availability zones.


#2

Hi Dmitry,

How about supporting placement into AZs in the public cloud? I’m talking about generic usecase when one may want to span juju-controlled services across multiple AZs.
Is that working at the moment?


#3

Yes, this would be the same for public clouds and the Juju provider code handles that if AZs are exposed by a given provider. The design from way back resembles what is described here.