Juju on lxd + ssh machine provisioning

The use-case:

LXD is configured with lxdbr0 for networking. Juju is bootstrapped to localhost, and charms today are deployed to containers.

We now need to provision machines for Juju that are created elsewhere on the network, using juju add-machine ssh:user@host.

The problem, during that process, is that the provisioning script attempts to download the juju agent to the target machine (VM) from the Juju controller, which is on the private 10.x.x.x subnet used by lxdbr0. The VM has no route to the 10.x.x.x network managed by lxd, and times out attempting to reach it.

I can think of a couple solutions, each with drawbacks, so I’m looking for more suggestions.

Potential solutions

Configure lxd to use a bridge

Configure the host networking to use a bridge, and allow each container to get its IP address via dhcp, using the host’s network.

The downside to this is that it requires the host is using dhcp, and that every container will then get a routable IP on the network.

Static routing

A static route would be added to the network router, making it aware of which host should receive traffic for the 10.x.x.x network used by lxdbr0. Additionally, an iptable PREROUTING rule to direct incoming traffic

macvlan

Easier than using a bridge, since it doesn’t require changes to the host, but is unsuitable for use with juju because it cannot do “hair-pinning”.

Other suggestions

There’ve been a few other suggestions so far, but need more input/research.

  • ovs bridging
  • Enabling FAN

Fundamentally, all agents need to be able to talk to the controller. So if you are hosting a controller in an LXD container and want to control machines outside of that machine, then you need a route back to the controller. Not just for provisioning, but all the time as it is how workloads are managed.

Also, if you are deploying some workloads in containers, and you want to also have those applications communicate with other applications, you’ll want to have those containers accessible to the other machines.

I would generally recommend bridging lxdbr0 onto the host network device and exposing your containers if you are going to mix local and remote machines. I will say that generally the LXD provider is not intended to host large, scalable workloads across many machines, and you’re running into the limitations there.

Another possibility with bionic + is to set up an LXD cluster, and then put everything into containers across the machines. However, you’ll still need some sort of answer to the routing issue.