Deploying K8S/Ceph to OVH


#1

I’m starting this post to note down my travails in deploying CDK and a few other bits backed by Ceph to OVH because I tried it earlier and failed… soooo I figure its a good thing to note down to see if there’s room for improvement/broken stuff in the manual deployment…

Updates due shortly.

Background

We have a bunch of Infrastructure living at OVH, Dedicated Servers, they’re quick, reliable and nowhere near as pricey as the public cloud services. We have a lot of long running services, plus CI/CD and test builds. We’re also investigating stuff on Knative and other Kubernetes frameworks and so it’s a good place for us to host it, without having to buy our own hardware but still with a pretty good degree of flexibility. Our servers are all bought with “Vrack” capabilities so we can stick them onto a virtual network that routes solely between our own hardware.

For this experiment, we’ve got

5 new servers, each with 16 GB RAM, Xeon processors and 2x4TB SATA hard drives.

3 will be Ceph nodes, each running 1 disk as the OS and 1 as the storage drive. The other 2 nodes will be 1 Kubernetes master and 1 Kubernetes worker with the OS mirrored across the 2 disks giving 4TB storage total on each.

All disks are installed with Bionic and brought up to date with apt.

We already have a Juju controller, running 2.4.7 models.


Post Deployment Tasks

For whatever reason OVH give you servers with a root account login. So first up we created ubuntu accounts for Juju to connect to.

adduser ubuntu
usermod -a -G admin ubuntu

We also put the public key into the ssh setup so that Juju can login.

Because we use a vrack, it means we don’t have to route our traffic via the external network, as such we need to configure a new ethernet adaptor:

nano /etc/netplan/01.yaml

network:
  version: 2
  renderer: networkd
  ethernets:
      eno2:
          dhcp4: no
          dhcp6: no
          addresses: [10.0.0.x/16]

Clearly x on each server is iterated to the next spare IP.

Then:

netplan apply
ifconfig

And you should see a new eth adaptor for the vrack. We also had to put the server into the relevant Vrack in the OVH config.

Commissioning The Machines

With all that out of the way, we then added the machines to the model as its a manual deployment and this has to be done to gain access to them…

juju add-model kubernetes
juju switch kubernetes
juju add-machine ssh:ubuntu@10.0.0.15
juju add-machine ssh:ubuntu@10.0.0.19

juju add-model ceph
juju switch ceph
juju add-machine ssh:ubuntu@10.0.0.16
juju add-machine ssh:ubuntu@10.0.0.17
juju add-machine ssh:ubuntu@10.0.0.18

At this point you need to update the /etc/hosts file in each node to reflect the DNS pointers to each server.

Deploy Ceph

juju switch ceph
juju deploy ceph-mon --to 0
juju add-unit ceph-mon --to 1
juju add-unit ceph-mon --to 2
juju deploy ceph-osd --to 0
juju add-unit ceph-osd --to 1
juju add-unit ceph-osd --to 2
juju add-relation ceph-osd ceph-mon

Setup the disks

juju run-action ceph-osd/0 zap-disk devices=/dev/sdb i-really-mean-it=true
juju run-action ceph-osd/1 zap-disk devices=/dev/sdb i-really-mean-it=true
juju run-action ceph-osd/2 zap-disk devices=/dev/sdb i-really-mean-it=true
juju config ceph-osd osd-devices="/dev/sdb"

Deploying Kubernetes

As we’re experimenting, we’ll stick the master stuff onto one node… don’t shoot me.

juju deploy ~containers/kubernetes-master --to 0 --series bionic --config channel="1.13/stable"
juju deploy ~containers/etcd --to 0 --series bionic --config channel="3.2/stable"
juju deploy ~containers/easyrsa --to 0 --series bionic
juju deploy ~containers/kubernetes-worker --to 1 --series bionic --config channel="1.13/stable"
juju deploy ~containers/flannel --series bionic --config iface=eno2

juju add-relation kubernetes-master:kube-api-endpoint kubernetes-worker
juju add-relation kubernetes-master:kube-control kubernetes-worker
juju add-relation kubernetes-master:certificates easyrsa
juju add-relation kubernetes-master:etcd etcd:db
juju add-relation kubernetes-worker:certificates easyrsa
juju add-relation etcd:certificates easyrsa
juju add-relation flannel:etcd etcd
juju add-relation flannel:cni kubernetes-worker
juju add-relation flannel:cni kubernetes-master

Issues

ALL ISSUES HAVE BEEN RESOLVED BY PICKING THE SNAP CHANNEL THIS IS KEPT JUST IN CASE!
So far, for some reason I had to set

juju config kubernetes-master storage-backend=etcd3

I don’t know if that was because I’d updated from an earlier version and I’d set it incorrectly, but it claimed etcd2 was an invalid option. This confused me for a while as I actually had etcd aliased as etcd2 and etcd3 for a while and I thought it was referring to the application, not protocol.

At this point though, it tries to spin up everything and grinds to a complete halt with:

Feb 10 22:21:22 ubuntu kube-apiserver.daemon[23990]: logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x12(\n\x06\n\x00\x12\x00\x1a\x00\x12\aFailure\x1a\x10context canceled\"\x000\xf4\x03\x1a\x00\"\x00"
Feb 10 22:21:22 ubuntu kube-apiserver.daemon[23990]:  [kube-controller-manager/v1.13.3 (linux/amd64) kubernetes/721bfa7/leader-election 127.0.0.1:59412]
Feb 10 22:21:35 ubuntu kube-controller-manager.daemon[17170]: E0210 22:21:35.123620   17170 leaderelection.go:270] error retrieving resource lock kube-system/kube-controller-manager: Get http://127.0.0.1:8080/api/v1/namespaces/kube
-system/endpoints/kube-controller-manager?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Feb 10 22:21:35 ubuntu kube-apiserver.daemon[23990]: E0210 22:21:35.137920   23990 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}

So we’ve got a networking issues somewhere, but time to figure out where…

root@kubernetes-master1:~# curl http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s
{"metadata":{},"status":"Failure","message":"Timeout: request did not complete within 1m0s","reason":"Timeout","details":{},"code":504}

Kublet on the worker node is throwing up because apart from other things, swap is on, it makes Kubelet sad…

Relating K8S and Ceph

So next we need to relate K8S and Ceph. To do so we’re going to use a cross model relation.

juju switch ceph
juju offer ceph-mon:admin

Which should then say something like:

Application "ceph-mon" endpoints [admin] available at "serveradmin/ceph.ceph-mon"

Then

juju switch kubernetes
juju add-relation kubernetes-master serveradmin/ceph.ceph-mon
juju config kubernetes-master allow-privileged=true

juju switch ceph
juju run-action ceph-mon/0 create-pool name=ext4-pool --wait
juju run-action ceph-mon/0 create-pool name=xfs-pool --wait

juju switch kubernetes
juju ssh kubernetes-master/0
kubectl get sc,po

Should return

root@kubernetes-worker1:~# kubectl get sc,po
NAME                                             PROVISIONER     AGE
storageclass.storage.k8s.io/ceph-ext4            csi-rbdplugin   5m54s
storageclass.storage.k8s.io/ceph-xfs (default)   csi-rbdplugin   5m54s

NAME                              READY   STATUS    RESTARTS   AGE
pod/csi-rbdplugin-attacher-0      1/1     Running   0          5m53s
pod/csi-rbdplugin-drv48           2/2     Running   0          5m53s
pod/csi-rbdplugin-provisioner-0   1/1     Running   0          5m53s

Test the whole setup with:

snap install helm --classic
helm init
helm install stable/phpbb

Finally! You should see something like:

root@kubernetes-worker1:~# kubectl get sc,po,pvc
NAME                                             PROVISIONER     AGE
storageclass.storage.k8s.io/ceph-ext4            csi-rbdplugin   10m
storageclass.storage.k8s.io/ceph-xfs (default)   csi-rbdplugin   10m

NAME                                           READY   STATUS    RESTARTS   AGE
pod/csi-rbdplugin-attacher-0                   1/1     Running   0          10m
pod/csi-rbdplugin-drv48                        2/2     Running   0          10m
pod/csi-rbdplugin-provisioner-0                1/1     Running   0          10m
pod/inclined-elephant-mariadb-0                1/1     Running   0          88s
pod/inclined-elephant-phpbb-7f7cf5d99b-bp99b   0/1     Running   0          88s

NAME                                                     STATUS   VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/data-inclined-elephant-mariadb-0   Bound    pvc-914919fd2e5211e9   8Gi        RWO            ceph-xfs       88s
persistentvolumeclaim/inclined-elephant-phpbb-apache     Bound    pvc-911788ff2e5211e9   1Gi        RWO            ceph-xfs       89s
persistentvolumeclaim/inclined-elephant-phpbb-phpbb      Bound    pvc-911830232e5211e9   8Gi        RWO            ceph-xfs       89s
root@kubernetes-worker1:~# kubectl get sc,po,pvc,pv
NAME                                             PROVISIONER     AGE
storageclass.storage.k8s.io/ceph-ext4            csi-rbdplugin   10m
storageclass.storage.k8s.io/ceph-xfs (default)   csi-rbdplugin   10m

NAME                                           READY   STATUS    RESTARTS   AGE
pod/csi-rbdplugin-attacher-0                   1/1     Running   0          10m
pod/csi-rbdplugin-drv48                        2/2     Running   0          10m
pod/csi-rbdplugin-provisioner-0                1/1     Running   0          10m
pod/inclined-elephant-mariadb-0                1/1     Running   0          95s
pod/inclined-elephant-phpbb-7f7cf5d99b-bp99b   0/1     Running   0          95s

NAME                                                     STATUS   VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/data-inclined-elephant-mariadb-0   Bound    pvc-914919fd2e5211e9   8Gi        RWO            ceph-xfs       95s
persistentvolumeclaim/inclined-elephant-phpbb-apache     Bound    pvc-911788ff2e5211e9   1Gi        RWO            ceph-xfs       96s
persistentvolumeclaim/inclined-elephant-phpbb-phpbb      Bound    pvc-911830232e5211e9   8Gi        RWO            ceph-xfs       96s

NAME                                    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      STORAGECLASS   REASON   AGE
persistentvolume/pvc-911788ff2e5211e9   1Gi        RWO            Delete           Bound    default/inclined-elephant-phpbb-apache     ceph-xfs                86s
persistentvolume/pvc-911830232e5211e9   8Gi        RWO            Delete           Bound    default/inclined-elephant-phpbb-phpbb      ceph-xfs                86s
persistentvolume/pvc-914919fd2e5211e9   8Gi        RWO            Delete           Bound    default/data-inclined-elephant-mariadb-0   ceph-xfs                87s

So now you should have a phpbb helm chart running, with 3 volumes claimed from Ceph-XFS.

Now with that done, throw Tiller in the bin and write some K8S compatible charms!


#2

Few notes on the Ceph bits based on my experience.

  1. Sata DOM
    The iops happening on / are few. If you don’t care about redundancy on your / filesystem, then using a sata dom is a great way to free up your available drive bays for storage devices.

  2. Ceph IOPS
    Ceph runs multiple different processes which produce iops. With legacy journal/osd ceph, you have the journal iops and the osd iops. With bluestore you have the bluestore-wal, bluestore-db and osd device iops. In either case, you won’t get much further then a POC if you don’t decouple the different functions to their own dedicated device(s).