Dense-Deploy, Dynamic Workload Management with Pet Support


#1

This post is about private-cloud deployments in the first instance. Vendorclouds are viewed as burst-scale-out and backup strategies in what I’m working on.

I’m tasked with delivering an elegant way to manage dynamic workloads across a medium-sized cluster. The unique characteristics of this cluster are that there are several ‘pet’ nodes with specific purposes, however they’re utilized for this purpose infrequently, and on a set schedule. When they’re idling (more than 70% of an average week), each of these nodes provides as much compute capacity as 8-16 of the regular cluster nodes. Due to other dependencies and characteristics of these nodes, VM usage needs to be minimized (preferably not used at all).

An additional goal is to deliver a ‘DCIB’ script (data-center-in-a-box) which is something the folks I’m helping out are used to working with as a project framework.

After spending some time testing out implementations with tools I’m already familiar with (Salt, Terraform, etc) I revisited Juju for the first time since I first read about it. With the most recent implementation I found what I want to do with the following high-level implementation:

  1. Juju install to initial control plane. From there bootstrap a local lxd controller, and spin up containers for MAAS region and rack. Ideally I’d use Juju to deploy maas-rack and maas-region, but their snaps are currently broken, and my attempts to build an apt-install MAAS charm were met with some unexpected (and expected) complexity.

  2. Netboot all remaining nodes via MAAS and make necessary deployment changes to provide a standard range of common interfaces and storage across all nodes, tagging the pets according to their additional traits.

  3. Juju bootstrap MAAS controller. In dev I have been adding the controller to a local VM, then migrating it into the cluster with enable-ha and spinning down the VM machine.

  4. Juju deploy LXD-Cluster charm to all MAAS nodes. Most cluster-level workloads will deploy here, or in the…

  5. Juju bootstrap LXD-Cluster, Juju deploy CDK cluster to LXD cluster. This part doesn’t work yet. Conjure-up is the recommended way, but it’s currently hamstrung by recent lxd-profile changes within Juju(I believe). I have attempted to manually build my own CDK bundle, and add in the lxd-profile.yaml needed to have Juju deploy CDK to LXD with the necessary profile tweaks CDK needs to run inside containers.

  6. Finally, bootstrap the CDK cluster.

Storage for the entire setup will be ceph-backed. The Ceph cluster gets deployed to the MAAS controller, with OSDs to bare metals and MONs (and FS) to containers.

Like I said, this doesn’t work yet. Some of the features I’m using here leverage freshly shipped or beta code in Juju, MAAS, and LXD. But, once it does work, a very small script should be able to login to an initial box, then bootstrap the entire cluster, whilst providing instrumentation and tooling at each level to manage optimal deployment paths, that an operator can easily throw workloads around and scale dynamically, either via scripting or manually.

But why?
This gives flexibility to add services, via Juju, to a generally static set of hardware whilst deploying either to bare metal, containers, or k8s with a terrific set of management tools for every layer (MAAS for provisioning HW, LXD clustering to assist with backup chores and host migration, and CDK for deployments using process-level isolation or other k8s benefits beyond what’s best case for LXD). All one does is target a specific controller:model from a single prompt to work on the three major entrypoints that require regular maintenance to provide the most efficient compute capacity.

As mentioned at the beginning, this can and will expand further to bootstrapping a vendorcloud (I’m testing with Azure) for scaling-out and using CMR to hold the ship together.

The current implementation I’m looking to replace is using Docker Swarm and a lot of nasty race-condition-ridden docker stack deploy scripts which I’m looking to eliminate.

Anyway, this is a heads up on why I’m loitering around here and on IRC, being a general nuisance. Enjoy the rest of your holidays and see you in 2019!

*I wanted to title this topic ‘What’s he Building In There?’ after the Tom Waits song.


#2

A few bumps on this road:

  1. Juju’s provisioning of MAAS machines, including the LXD instrumentation it deploys does not play nice with the LXD-Cluster Charm (by @mskalka), I’m assuming this is due to my setting up LXD the way I like it as part of that charm’s deployment, which stomps over Juju’s similar but not quite the same configuration of LXD that gets placed on each provisioned machine in a model. I understand this is tweakable with some wheelhouse type stuff, but I’ve not wandered aimlessly into the darkness of that cave yet.

  2. It’s possible that a similar result could be accomplished by simply deploying k8s to metal, and deploying everything else as k8s workloads - the density that k8s containers provides is similar to that of LXD, along with some other bonuses. My beef with this tying oneself into a particular orchestrator or scheduler often precludes you from using other systems well. That desired versatility is not only a safety net but also an opportunity to quickly deploy workloads where an alternate deployment path already exists outside of the chosen orchestrator. As an example, I take an app that has no maintained charm built, (MAAS), which also has no functioning Snap. There’s some messy helm charts for MAAS out there, but the dependencies for those are generally a mess, or they’re unmaintained. However I can build out a scalable MAAS cluster in under 30 mins from scratch by spinning up a couple of containers, and apt-installing the region to one container, the rack to another, and (as of MAAS 2.5) simply clone the container across the cluster (and snapshot for teardown/reprovisioning) to scale out or migrate workloads rapidly. Maybe I can still do that without this layered base too, and I’m missing or glossing over something. If reading this thread makes you want to scream at me for such a foolish endeavor then please do :slight_smile:


#3

Wow, ok cool stuff. You’re pushing things along the edge but seems in ways that’s where we’re headed with attempting to provide flexible solutions across different layers.

Some notes, and feel free to bug with anything else:

  • agree that CDK using the new 2.5 lxd profile stuff will help and might cause issues at first as it’s unaware of it currently
  • if you want to tweak the lxd setup we recommend you use a unique profile string so it can build on top of what Juju’s putting down. One way to go about this might be do write a small subordinate charm that just provides a lxd profile tweak you relate to everything you’re running? I’m not sure what tweaks you’re making there when you say “my setting up LXD the way I like it”.
  • I’d definitely chat with the MAAS folks around ways of getting that setup as you’d like. They’ve got a place to chat over at https://discourse.maas.io/
  • Did you get anywhere with the lxd cluster charm? I know the author was going to tinker with it some for Juju 2.5 and I’ve not tested it out yet but would be cool to have that humming along.
  • I agree that just going all k8s can be a bit limiting. There’s an array of tools out there because honestly it’s a “best tool for the job” world.

We’re interested in following how this goes and what tools you end up finding are the right fit for the various jobs. Good luck!


#4

Thanks Rick. I’ve had some time over the holidays to take a few different lateral approaches to this, but I’m really mostly waiting for all the component backing services to play nice with one another.

Using a subordinate charm for lxd configuration. That’s interesting. I was looking for a way to apply an lxd-profile.yaml to an existing charm, to avoid having to roll my own charms (so I can deploy CDK to LXD without conjure-up). I think I briefly looked into that and it seemed like the lxd-profile.yaml in the related deployed charm didn’t get updated, but it’s likely I was doing it wrong.

I took a walk down memory lane and deployed the whole thing using Docker Swarm as orchestrator instead of k8s to see if I could replicate some functionality as an initial POC for demo purposes. I was surprised to be able to run Swarm in LXD almost flawlessly with just a few basic profile tweaks. Overlay networking isn’t behaving itself however, so I might drop flannel over the top there.

I’d like to more fully utilize Juju’s configuration management to handle things like storage - right now I’m manually mounting cephfs into containers with an LXD profile (perhaps sub charms will help here too) and I also have fallen back into my old habit of just deploying the ubuntu charm then customizing it, and forgetting what I did to be able to replicate it. Too much Egg nog!

Looking forward to a fun 2019!


#5

Yea, the trouble is you have to add that file to the charm itself so you’re going to be heading down charm-forking to test that path out.

Hah, well this is where if you do that in a container at least try to snapshot it every so often and see if you can get a decent base to copy from later.


#6

That’s the conclusion I’d drawn even prior to trying that out myself anyway. A desired future state from my POV would be to be able to have more dynamic access to the container profile in a charm, perhaps per application. One of the best things about LXD is the hot-update of much of the config, it’d be great to see that functionality carry through to it’s implementation here.