Ideal use-cases for MAAS (and pods) vs LXD (esp. with clustering) with Juju

I was having a discussion today with someone on IRC about the roles, relationships, and use-cases of MAAS and LXD when used with Juju and I thought it would be good to get feedback on my current understanding of where all the components stand with respect to each other and what the proper use-cases are for each. The discussion started in the context of managing Kubernetes with Juju (CDK) on top of their hardware resources and trying to come to terms with what the right tool would be for managing the hardware layer underneath Juju.

This is what I explained as my current understanding, and I’d appreciate corrections and clarifications on it:

MAAS is ideally suited for when you have a (possibly heterogeneous) set of hardware resources that you want to have a lot of control over divvying up how those resources are allocated. The downside there is that you are expected to make more decisions up-front and be more involved in the resource allocation. With Juju, you can use tag constraints to allocation specific applications to specific pools of machines or VMs, or even target specific units to specific machines, if necessary. Or you can use constraints in a more traditional way to have Juju and MAAS figure out the right placement based on resource needs defined by the charms.

LXD, especially with the newer clustering support, is ideally suited for when you have a more or less homogeneous set of hardware resources that you want to very easily divvy up very efficiently, without having to specifically allocate how those resources are allocated and used. The downside there is that you don’t have nearly as much control over the resource allocation. With Juju, there’s not much you can do with respect to constraints, but as long as your hardware has the resources, the applications will use it efficiently according to their run-time needs.

The newer pods feature of MAAS seems to bridge those two use-cases by providing a more dynamic trade-off between flexibility and control. (I don’t know as much about this feature or how to leverage it properly with Juju.)

I’d also like some hear from both the MAAS and Juju teams on what they think is the ideal and efficient controller layout and how they are working together towards getting there.

Right now, if you’re using MAAS and Juju, you first need a MAAS controller (a region and rack controller) which can be installed either from packages, snap or in a LXD container. That usually takes up at least one physical machine.

Then when you need to bootstrap some Juju controllers, your options are to either use more physical machines managed by MAAS or to hack around on existing machines and manually add existing KVM machines to your MAAS inventory which is just a pain and quickly becomes unmanageable.

What is the way forward?

Let me start then with my understanding of the various tools available.

MAAS

MAAS is really for scaling out your hardware. There’s more requirements that the hardware supporting remote control like IPMI or equivelent. It’s meant to really put an API in front of rack (and rack and racks if you have them) of hardware and allocate it among users, deploy with a choice of images and OS versions, etc.

LXD

The LXD cluster stuff is cool because you don’t need to have special hardware that MAAS can drive. The downside is that you need to configure/setup the cluster manually. There’s no webui to push a few buttons and have three machines ready to go for Juju. You have to SSH to each and set them up into the cluster with LXD init. It also doesn’t support various user accounts, different hardware/tags (e.g. storage or GPU enabled hardware) in the mix.

The good thing is that from there you can setup random hardware you have running around. You can setup things in HA spread across machines, and you can grow the cluster as you need. With the LXD constraint support coming in 2.5 you can cap off resources on those containers and really run a pretty dense setup very cheaply for testing/QA/etc if you want.

Mixing the two

What’s intersting is that in developing cluster support I’m actually using my MAAS to allocate bionic machines, setting up the cluster, and then bootstrapping to the cluster. In this way I can run as many controllers I want and the workloads on them using fewer resources. Let’s say I deploy 5 machines in MAAS and setup the LXD cluster I can operate 10 Juju controllers on there no problem. More practically it’s let me have three machines in MAAS with an HA Juju controller and then deploy workloads on there. If I want to grow the cluster I just ask MAAS for another machine and then add it to the cluster manually.

As the cluster stuff is new I’ve not looked farther into collapsing things. In theory you could have your MAAS controller nodes as members of the LXD cluster for some really dense work but that’s not how I’d look to do things for production.

1 Like