Container interfaces bridged to unnumbered host interfaces

drulgaard · 31 October 2019 10:44

Hi,

I really struggled with what to name this post. Hopefully the title captures the interest of those who might be able to help.

I’m trying to deploy our first production OpenStack with MaaS/Juju. I’ve successfully built a one-host (LXD localhost cloud) deployment after spending a couple of weeks figuring out how to get MaaS and Juju to play nicely with our corporate proxy. I’ve also successfully built a 5-host MaaS cloud with full HA (pretty much 3 of everything, using vips).

Now I’m trying to build up what will become a 21-host cloud, with potential to grow further, as a showcase for MaaS and Juju (and Ubuntu, as it happens), so I’m keen to get this working, and working well.

I’m working with a relatively limited number of routable IPs; I want to use a /24 for my “provider” network, which is fine (our application only needs a couple of routable IPs per instance), and a /26 for what I have dubbed our “service” network; this is where I want to provide access to MaaS and Juju (for us admins) and the OpenStack APIs and dashboard (for both us and our users).

The problem is, with 21 hosts, 35 containers, and 9 vips, I’m already in need of 65 IPs, plus a few others (e.g. network gateway, MaaS rack controller, Juju controller).

My “solution”, so I thought, was to have a third network, let’s say “internal”, which has all the hosts and containers on it, leaving the vips on the routable “service” network.

A little extra background; I have configured MaaS to give each machine a bond containing all four NICs on each host (LACP, 802.3ad), and on top of that run three VLANs; one for “internal” (untagged/native), one for “service” (tagged), and one for “provider” (tagged). If I deploy a machine with MaaS, I can log in to it and see bond0, bond0.X, and bond0.Y where X and Y are the “service” and “provider” VLAN IDs. I am using a non-routable subnet for the “internal” VLAN, and MaaS has given bond0 an appropriate IP.

However, if I deploy an application - say, mysql (percona-cluster) with mysql-ha (hacluster) - to an lxd container (necessary in order to safely deploy several applications to the same host), I run into difficulties; the lxd container only has an interface bridged to “internal”, but I need an interface on “service” on which to place the vip for the application.

I have configured spaces in MaaS with the relevant VLANs and subnets, and ensured Juju has the spaces too. I have experimented with --constraint "space=..." and --bind options, and this results in Juju failing to deploy the container with unable to setup network: host machine "X" has no available device in space(s) "service" errors. It seems that since the host doesn’t have an IP address on the “service” interface (bond0.X), Juju can’t relate the interface to the space (is this the relevant code?). The connection between space and VLAN exists in MaaS, however, should Juju need it.

An experiment of adding a (static) IP to the bond0.X interface via MaaS got a little further, but since MaaS isn’t providing DHCP on that subnet, LXD seemed to assign a random IP in that subnet to the container, which clashed with an existing IP, and all hell broke loose! Regardless, this wastes IPs on the service subnet, defeating the object.

A second experiment of deploying directly to the machine worked perfectly (the vip was added to the “service” interface), so the charm appears to support this.

I don’t think what I’m attempting is a particularly wacky or unusual idea, as separating service endpoints (on routable IPs) from internal infrastructure (on non-routable IPs) is a common security concept (and more efficient on IP usage).

I’ve read the Juju and IP Addresses development post which seems to be in a related area, so if my use case isn’t currently possible, perhaps it could be considered in that piece of work.

If anyone has any suggestions on how I can achieve my goal, I’d welcome them. For now, I’m having to proceed with a larger subnet (probably a /24) as a combined “service” and “internal” network which should allow sufficient room for this deployment and some further expansion.

Thanks,

D.

manadart · 4 November 2019 15:51

Thanks for this detailed post.

We’re doing a lot of work on Juju’s networking internals at the moment and we definitely want to improve our handling of L2 devices in scenarios such as yours.

Can you provide an example of what /etc/netplan/<your-cfg>.yaml looks like on one of the MAAS machines with the bond connected to all 3 vlans?

drulgaard · 5 November 2019 11:29

Thanks for the response; glad to hear you’re looking in this area, and happy to help where I can.

Unfortunately, we’ve had to move on with a 2-VLAN solution, so I don’t currently have a 3-VLAN box to grab the config from, but I’ve grabbed a 2-VLAN version and adjusted it to include the 3rd VLAN, based on my memory of it; hopefully it’s correct, or close enough!

# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        eno1:
            match:
                macaddress: xx:xx:xx:xx:xx:30
            mtu: 1500
            set-name: eno1
        eno2:
            match:
                macaddress: xx:xx:xx:xx:xx:31
            mtu: 1500
            set-name: eno2
        eno3:
            match:
                macaddress: xx:xx:xx:xx:xx:32
            mtu: 1500
            set-name: eno3
        eno4:
            match:
                macaddress: xx:xx:xx:xx:xx:33
            mtu: 1500
            set-name: eno4
    bonds:
        bond0:
            addresses:
            - 10.0.0.136/24
            interfaces:
            - eno1
            - eno2
            - eno3
            - eno4
            macaddress: xx:xx:xx:xx:xx:30
            mtu: 1500
            nameservers:
                addresses:
                - 10.0.0.12
                search:
                - maas
            parameters:
                down-delay: 0
                lacp-rate: fast
                mii-monitor-interval: 100
                mode: 802.3ad
                transmit-hash-policy: layer3+4
                up-delay: 0
    vlans:
        bond0.41:
            id: 41
            link: bond0
            mtu: 1500
        bond0.42:
            id: 42
            link: bond0
            mtu: 1500

Also, I forgot to mention before that the MaaS host also has squid running on it, and Juju is configured to use this proxy for all deployments (via http-proxy and apt-http-proxy statements supplied to --config and --model-default options on the juju bootstrap invocation). This enables Juju to deploy new machines and units without any external routing, everything downloads via the proxy.

It also occurs to me that we’ll need a default gateway to be added (for the bond0.42 interface) so we can route traffic from the VIPs back to remote clients.

jameinel · 5 November 2019 12:06

To elaborate on this a bit. Juju currently uses L3 for determining space information. So it needs an IP in the subnet to decide which interface on the host machine is in the associated space, in order to bridge that interface and allow the container to also use that space.

There isn’t a conceptual reason to need it, but it is the matter of how the actual code operates today. We are currently fixing a couple bits around this, but we haven’t yet gotten to supporting interfaces without IPs.

I believe the existing workaround is to assign IPs to the host machines, and then remove them. Or create a fake subnet that is part of the same space, but uses an IP range that isn’t actually part of it. I think you’ll run into issues with the latter, because the charm itself probably uses the subnet range to allow the VIP to work, so if the VIP isn’t in the same subnet as the original IP then VIP doesn’t work either.