I’m starting this post to note down my travails in deploying CDK and a few other bits backed by Ceph to OVH because I tried it earlier and failed… soooo I figure its a good thing to note down to see if there’s room for improvement/broken stuff in the manual deployment…
Updates due shortly.
Background
We have a bunch of Infrastructure living at OVH, Dedicated Servers, they’re quick, reliable and nowhere near as pricey as the public cloud services. We have a lot of long running services, plus CI/CD and test builds. We’re also investigating stuff on Knative and other Kubernetes frameworks and so it’s a good place for us to host it, without having to buy our own hardware but still with a pretty good degree of flexibility. Our servers are all bought with “Vrack” capabilities so we can stick them onto a virtual network that routes solely between our own hardware.
For this experiment, we’ve got
5 new servers, each with 16 GB RAM, Xeon processors and 2x4TB SATA hard drives.
3 will be Ceph nodes, each running 1 disk as the OS and 1 as the storage drive. The other 2 nodes will be 1 Kubernetes master and 1 Kubernetes worker with the OS mirrored across the 2 disks giving 4TB storage total on each.
All disks are installed with Bionic and brought up to date with apt.
We already have a Juju controller, running 2.4.7 models.
–
Post Deployment Tasks
For whatever reason OVH give you servers with a root account login. So first up we created ubuntu accounts for Juju to connect to.
adduser ubuntu
usermod -a -G admin ubuntu
We also put the public key into the ssh setup so that Juju can login.
Because we use a vrack, it means we don’t have to route our traffic via the external network, as such we need to configure a new ethernet adaptor:
nano /etc/netplan/01.yaml
network:
version: 2
renderer: networkd
ethernets:
eno2:
dhcp4: no
dhcp6: no
addresses: [10.0.0.x/16]
Clearly x on each server is iterated to the next spare IP.
Then:
netplan apply
ifconfig
And you should see a new eth adaptor for the vrack. We also had to put the server into the relevant Vrack in the OVH config.
–
Commissioning The Machines
With all that out of the way, we then added the machines to the model as its a manual deployment and this has to be done to gain access to them…
juju add-model kubernetes
juju switch kubernetes
juju add-machine ssh:ubuntu@10.0.0.15
juju add-machine ssh:ubuntu@10.0.0.19
juju add-model ceph
juju switch ceph
juju add-machine ssh:ubuntu@10.0.0.16
juju add-machine ssh:ubuntu@10.0.0.17
juju add-machine ssh:ubuntu@10.0.0.18
At this point you need to update the /etc/hosts file in each node to reflect the DNS pointers to each server.
Deploy Ceph
juju switch ceph
juju deploy ceph-mon --to 0
juju add-unit ceph-mon --to 1
juju add-unit ceph-mon --to 2
juju deploy ceph-osd --to 0
juju add-unit ceph-osd --to 1
juju add-unit ceph-osd --to 2
juju add-relation ceph-osd ceph-mon
Setup the disks
juju run-action ceph-osd/0 zap-disk devices=/dev/sdb i-really-mean-it=true
juju run-action ceph-osd/1 zap-disk devices=/dev/sdb i-really-mean-it=true
juju run-action ceph-osd/2 zap-disk devices=/dev/sdb i-really-mean-it=true
juju config ceph-osd osd-devices="/dev/sdb"
–
Deploying Kubernetes
As we’re experimenting, we’ll stick the master stuff onto one node… don’t shoot me.
juju deploy ~containers/kubernetes-master --to 0 --series bionic --config channel="1.13/stable"
juju deploy ~containers/etcd --to 0 --series bionic --config channel="3.2/stable"
juju deploy ~containers/easyrsa --to 0 --series bionic
juju deploy ~containers/kubernetes-worker --to 1 --series bionic --config channel="1.13/stable"
juju deploy ~containers/flannel --series bionic --config iface=eno2
juju add-relation kubernetes-master:kube-api-endpoint kubernetes-worker
juju add-relation kubernetes-master:kube-control kubernetes-worker
juju add-relation kubernetes-master:certificates easyrsa
juju add-relation kubernetes-master:etcd etcd:db
juju add-relation kubernetes-worker:certificates easyrsa
juju add-relation etcd:certificates easyrsa
juju add-relation flannel:etcd etcd
juju add-relation flannel:cni kubernetes-worker
juju add-relation flannel:cni kubernetes-master
Issues
ALL ISSUES HAVE BEEN RESOLVED BY PICKING THE SNAP CHANNEL THIS IS KEPT JUST IN CASE!
So far, for some reason I had to set
juju config kubernetes-master storage-backend=etcd3
I don’t know if that was because I’d updated from an earlier version and I’d set it incorrectly, but it claimed etcd2 was an invalid option. This confused me for a while as I actually had etcd aliased as etcd2 and etcd3 for a while and I thought it was referring to the application, not protocol.
At this point though, it tries to spin up everything and grinds to a complete halt with:
Feb 10 22:21:22 ubuntu kube-apiserver.daemon[23990]: logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x12(\n\x06\n\x00\x12\x00\x1a\x00\x12\aFailure\x1a\x10context canceled\"\x000\xf4\x03\x1a\x00\"\x00"
Feb 10 22:21:22 ubuntu kube-apiserver.daemon[23990]: [kube-controller-manager/v1.13.3 (linux/amd64) kubernetes/721bfa7/leader-election 127.0.0.1:59412]
Feb 10 22:21:35 ubuntu kube-controller-manager.daemon[17170]: E0210 22:21:35.123620 17170 leaderelection.go:270] error retrieving resource lock kube-system/kube-controller-manager: Get http://127.0.0.1:8080/api/v1/namespaces/kube
-system/endpoints/kube-controller-manager?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Feb 10 22:21:35 ubuntu kube-apiserver.daemon[23990]: E0210 22:21:35.137920 23990 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
So we’ve got a networking issues somewhere, but time to figure out where…
root@kubernetes-master1:~# curl http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s
{"metadata":{},"status":"Failure","message":"Timeout: request did not complete within 1m0s","reason":"Timeout","details":{},"code":504}
Kublet on the worker node is throwing up because apart from other things, swap is on, it makes Kubelet sad…
Relating K8S and Ceph
So next we need to relate K8S and Ceph. To do so we’re going to use a cross model relation.
juju switch ceph
juju offer ceph-mon:admin
Which should then say something like:
Application "ceph-mon" endpoints [admin] available at "serveradmin/ceph.ceph-mon"
Then
juju switch kubernetes
juju add-relation kubernetes-master serveradmin/ceph.ceph-mon
juju config kubernetes-master allow-privileged=true
juju switch ceph
juju run-action ceph-mon/0 create-pool name=ext4-pool --wait
juju run-action ceph-mon/0 create-pool name=xfs-pool --wait
juju switch kubernetes
juju ssh kubernetes-master/0
kubectl get sc,po
Should return
root@kubernetes-worker1:~# kubectl get sc,po
NAME PROVISIONER AGE
storageclass.storage.k8s.io/ceph-ext4 csi-rbdplugin 5m54s
storageclass.storage.k8s.io/ceph-xfs (default) csi-rbdplugin 5m54s
NAME READY STATUS RESTARTS AGE
pod/csi-rbdplugin-attacher-0 1/1 Running 0 5m53s
pod/csi-rbdplugin-drv48 2/2 Running 0 5m53s
pod/csi-rbdplugin-provisioner-0 1/1 Running 0 5m53s
Test the whole setup with:
snap install helm --classic
helm init
helm install stable/phpbb
Finally! You should see something like:
root@kubernetes-worker1:~# kubectl get sc,po,pvc
NAME PROVISIONER AGE
storageclass.storage.k8s.io/ceph-ext4 csi-rbdplugin 10m
storageclass.storage.k8s.io/ceph-xfs (default) csi-rbdplugin 10m
NAME READY STATUS RESTARTS AGE
pod/csi-rbdplugin-attacher-0 1/1 Running 0 10m
pod/csi-rbdplugin-drv48 2/2 Running 0 10m
pod/csi-rbdplugin-provisioner-0 1/1 Running 0 10m
pod/inclined-elephant-mariadb-0 1/1 Running 0 88s
pod/inclined-elephant-phpbb-7f7cf5d99b-bp99b 0/1 Running 0 88s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-inclined-elephant-mariadb-0 Bound pvc-914919fd2e5211e9 8Gi RWO ceph-xfs 88s
persistentvolumeclaim/inclined-elephant-phpbb-apache Bound pvc-911788ff2e5211e9 1Gi RWO ceph-xfs 89s
persistentvolumeclaim/inclined-elephant-phpbb-phpbb Bound pvc-911830232e5211e9 8Gi RWO ceph-xfs 89s
root@kubernetes-worker1:~# kubectl get sc,po,pvc,pv
NAME PROVISIONER AGE
storageclass.storage.k8s.io/ceph-ext4 csi-rbdplugin 10m
storageclass.storage.k8s.io/ceph-xfs (default) csi-rbdplugin 10m
NAME READY STATUS RESTARTS AGE
pod/csi-rbdplugin-attacher-0 1/1 Running 0 10m
pod/csi-rbdplugin-drv48 2/2 Running 0 10m
pod/csi-rbdplugin-provisioner-0 1/1 Running 0 10m
pod/inclined-elephant-mariadb-0 1/1 Running 0 95s
pod/inclined-elephant-phpbb-7f7cf5d99b-bp99b 0/1 Running 0 95s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-inclined-elephant-mariadb-0 Bound pvc-914919fd2e5211e9 8Gi RWO ceph-xfs 95s
persistentvolumeclaim/inclined-elephant-phpbb-apache Bound pvc-911788ff2e5211e9 1Gi RWO ceph-xfs 96s
persistentvolumeclaim/inclined-elephant-phpbb-phpbb Bound pvc-911830232e5211e9 8Gi RWO ceph-xfs 96s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-911788ff2e5211e9 1Gi RWO Delete Bound default/inclined-elephant-phpbb-apache ceph-xfs 86s
persistentvolume/pvc-911830232e5211e9 8Gi RWO Delete Bound default/inclined-elephant-phpbb-phpbb ceph-xfs 86s
persistentvolume/pvc-914919fd2e5211e9 8Gi RWO Delete Bound default/data-inclined-elephant-mariadb-0 ceph-xfs 87s
So now you should have a phpbb helm chart running, with 3 volumes claimed from Ceph-XFS.
Now with that done, throw Tiller in the bin and write some K8S compatible charms!