Juju-provisioned machines hang at boot waiting for DHCP


#1

We’re experiencing a regression related to how Juju configures the network interfaces.

We have a vmware cluster where each model has two networks; an internal which has a dhcp server and an external without. In the past, juju would leave the interface on the network without a dhcp server unconfigured. Users could then manually add an IP to that interface to use the external network.

However, Juju’s behavior recently changed so that it tries dhcp on the external network during every boot except the first one, and blocks the boot until it receives an IP or times out. This causes every machine to hang for 2 or 10 minutes during every boot while waiting for the network to be up.

Is this behavior intentional or is this a bug? Are we misusing the external-network function?

These are the details of a Kubernetes master which hangs 2 minutes on every boot.

$systemd-analyze blame
       2min 63ms systemd-networkd-wait-online.service
          9.011s docker.service
          8.449s cloud-init-local.service
          7.311s dev-sda1.device
          6.793s snapd.service
          5.872s lxd-containers.service
          4.479s networkd-dispatcher.service
          3.042s ssh.service
          2.806s lvm2-monitor.service
          2.608s systemd-journal-flush.service
          2.423s cloud-init.service
          2.205s accounts-daemon.service
          1.880s cloud-config.service
          1.793s apparmor.service
          1.763s cloud-final.service
          1.715s rbdmap.service
          1.634s apport.service
          1.606s keyboard-setup.service
          1.602s ubuntu-fan.service
          1.337s run-rpc_pipefs.mount
          ...
$ systemctl status systemd-networkd-wait-online.service
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2018-10-17 14:34:51 UTC; 28min ago
     Docs: man:systemd-networkd-wait-online.service(8)
  Process: 912 ExecStart=/lib/systemd/systemd-networkd-wait-online (code=exited, status=1/FAILURE)
 Main PID: 912 (code=exited, status=1/FAILURE)

Oct 17 14:32:51 juju-37156e-26 systemd-networkd-wait-online[912]: ignoring: lo
Oct 17 14:32:51 juju-37156e-26 systemd-networkd-wait-online[912]: ignoring: lo
Oct 17 14:32:52 juju-37156e-26 systemd-networkd-wait-online[912]: ignoring: lo
Oct 17 14:32:52 juju-37156e-26 systemd-networkd-wait-online[912]: ignoring: lo
Oct 17 14:32:53 juju-37156e-26 systemd-networkd-wait-online[912]: ignoring: lo
Oct 17 14:32:53 juju-37156e-26 systemd-networkd-wait-online[912]: ignoring: lo
Oct 17 14:34:51 juju-37156e-26 systemd-networkd-wait-online[912]: Event loop failed: Connection timed out
Oct 17 14:34:51 juju-37156e-26 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 14:34:51 juju-37156e-26 systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Oct 17 14:34:51 juju-37156e-26 systemd[1]: Failed to start Wait for Network to be Configured.
$ networkctl status -a 
● 1: lo
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: loopback
           State: carrier (unmanaged)
         Address: 127.0.0.1
                  ::1

● 2: ens192
       Link File: /run/systemd/network/10-netplan-ens192.link
    Network File: /run/systemd/network/10-netplan-ens192.network
            Type: ether
           State: routable (configured)
            Path: pci-0000:0b:00.0
          Driver: vmxnet3
          Vendor: VMware
           Model: VMXNET3 Ethernet Controller
      HW Address: 00:50:56:39:74:79 (VMware, Inc.)
         Address: 10.10.139.125
                  fe80::250:56ff:fe39:7479
         Gateway: 10.10.136.1 (Super Micro Computer, Inc.)
             DNS: 10.10.139.95
  Search Domains: tenguvmware

● 3: ens224
       Link File: /lib/systemd/network/99-default.link
    Network File: /run/systemd/network/10-netplan-eth1.network
            Type: ether
           State: degraded (configuring)
            Path: pci-0000:13:00.0
          Driver: vmxnet3
          Vendor: VMware
           Model: VMXNET3 Ethernet Controller
      HW Address: 00:50:56:0b:5a:b3 (VMware, Inc.)
         Address: fe80::250:56ff:fe0b:5ab3

● 4: flannel.1
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: routable (unmanaged)
          Driver: vxlan
      HW Address: c2:52:e8:cb:0a:0a
         Address: 10.1.39.0
                  fe80::c052:e8ff:fecb:a0a

● 5: docker0
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: no-carrier (unmanaged)
          Driver: bridge
      HW Address: 02:42:64:e8:7b:df
         Address: 172.17.0.1

● 6: cni0
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: routable (unmanaged)
          Driver: bridge
      HW Address: 2e:5c:8c:3a:c1:40
         Address: 10.1.39.1
                  fe80::2c5c:8cff:fe3a:c140

● 7: vethfb006122
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: degraded (unmanaged)
          Driver: veth
      HW Address: ea:e5:3b:dd:35:c3
         Address: fe80::e8e5:3bff:fedd:35c3

● 8: veth346da004
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: degraded (unmanaged)
          Driver: veth
      HW Address: ce:59:53:e5:86:c1
         Address: fe80::cc59:53ff:fee5:86c1

● 9: vethbb8c2fc1
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: degraded (unmanaged)
          Driver: veth
      HW Address: 3a:8e:fe:3f:55:a0
         Address: fe80::388e:feff:fe3f:55a0

● 10: veth59157a24
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: degraded (unmanaged)
          Driver: veth
      HW Address: 3e:f8:45:36:e1:88
         Address: fe80::3cf8:45ff:fe36:e188

● 11: veth50399a2d
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: degraded (unmanaged)
          Driver: veth
      HW Address: 4a:f3:ee:a7:ca:2c
         Address: fe80::48f3:eeff:fea7:ca2c

● 12: veth9f2acbf8
       Link File: /lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: degraded (unmanaged)
          Driver: veth
      HW Address: c6:e9:80:1e:06:5d
         Address: fe80::c4e9:80ff:fe1e:65d
$ ls /etc/netplan/
50-cloud-init.yaml  99-juju.yaml
$ cat /etc/netplan/50-cloud-init.yaml 
# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        ens192:
            dhcp4: true
            match:
                macaddress: 00:50:56:39:74:79
            set-name: ens192
$ cat /etc/netplan/99-juju.yaml 
network:
  version: 2
  ethernets:
    eth0:
      match:
        macaddress: 00:50:56:39:74:79
      dhcp4: true
    eth1:
      match:
        macaddress: 00:50:56:0b:5a:b3
      dhcp4: true

#2

Merlijn, this sounds like a bug. As you note it’s odd to not do it on the first boot. Can you file it there and I’ll get someone to look into it? You mention specifying the external-network can you also share the juju bootstrap command you’re using to go with this?

Thank you!


#3

https://bugs.launchpad.net/juju/+bug/1799958

I don’t know the bootstrap command anymore, but I included the output of juju model-config in the bug report.


#4

Thanks Marlijn!

…and I need more characters before discourse will let me post…