Failing to bootstrap on K8s

help-needed

#1

I’m trying to bootstrap onto a bare metal K8s deploy that’s backed by Ceph for persistent storage.

I’m getting this error message when trying to bootstrap.
https://pastebin.ubuntu.com/p/kQKFkcjZB8/

If it helps this is the kubectl information.

https://pastebin.ubuntu.com/p/54N3H47cXC/

It’s very possible I’ve setup the K8s deployment wrong for this environment but at this point it’s not obvious to me where it’s failing. Any help is appreciated.


#2

These kinds of commands can help illuminate:

kubectl -n <k8s-model> get sc,pv,pvc
kubectl -n <k8s-model> describe pods,sc,pv,pvc

#3

Debug logging which fails to query the cloud when trying to run add-k8s: Ubuntu Pastebin

Because of that failure, I added the cloud with: juju add-k8s k8s --storage=rbd.csi.ceph.com --cloud=bos01 --region=bos01

I’ve provided kubectl get all --all-namespaces above in the original post. Since the controller can’t bootstrap there isn’t a k8s-model name space at all. The repeated error is juju saying it can’t find the DNS service which you’ll see is running in the above kubectl output.

I can describe any/all of the pods, sc,pv,pvc as well if there’s something in specific that I miss let me know.

Here are the sc,pv,pvc: Ubuntu Pastebin

Here is the describe output: Ubuntu Pastebin

It appears that all of the services and pods are running, and ceph storage is available. I’m not clear why juju can’t add-k8s itself and maybe manually adding as I did masked the actual root issue.

I’ve redeployed, removed the manually added add-k8s (which is where all of this output is from) but as before add-k8s fails.


#4

I’m really stuck here, is there something more I can do to get some insight what juju is doing when it tries to bootstrap and reports that it can’t find the DNS service?


#5

Hi @chris.sanders the --cloud is the cloud type but not cloud name.
The list of cloud type we support are
azure, ec2, gce, microk8s, openstack, maas, lxd(the controller svc will use clusterIP, so it’s accessible only on the host instance locally), etc.

If it’s CDK, please ensure related integrator charm are ready because Juju needs them to provision storage, loadbalancer etc.


#6

Is it expected that the add-k8s can’t provision on it’s own when kubectl works fine?

I can re-add the cloud with a ‘cloud type’ although I’m not clear why juju would need to know that. This is CDK on top of OpenStack, there is no integrator charm as I do not want K8s using LBaaS or Storage from OpenStack. I’ve deployed Ceph and that’s the storage K8s is using.

Given this, I should use --cloud=openstack when manually doing the add-k8s? I’ll try that, can you explain why juju needs to know the provider that CDK is deployed on when I’m not using an integrator?

I suspect this is going to try and pick storage for me that isn’t available or desired. Should I use this with --storage=rbd.csi.ceph.com as I did before but set openstack as the cloud type?
Am I suppose to provide the OpenStack credentials (which is already available as a cloud type)?

If it’s of any help this is the bundle currently deployed: Ubuntu Pastebin

It’s 4 nodes of ceph with CDK intending to use ceph for the CDK storage.


#7

Alright, the openstack cloud is runing unreasonable slow so I just redeployed onto GCE, and passed that through with the manual flags on add-k8s (tried it w/o the flags first fails as before).

https://pastebin.ubuntu.com/p/Rb5SCDk8Ts/
Here you’ll see the commands to add-k8s and the bootstrap attempt. This produces the same error about locating the DNS service as before. I am very curious about this line “k8s substrate “gce/us-east1” added as cloud “gce-k8s” with GCE Persistent Disk default storage provisioned by the existing “rbd.csi.ceph.com” storage class.”

I don’t want GCE Persistent Disks to be used, that’s why I have Ceph. I feel like there’s some sort of disconnect here with not using the provider for persistent storage. If for example, I had a MAAS environment can I not just use a stand alone Ceph to provide storage?

Does anyone have an example how to use K8s with Ceph for Persistant storage or any idea why this is still failing?


#8

Hi, @chris.sanders Juju needs the cloud type to decide service type for controller and storage to use.
The add-k8s cmd reads cloud and credentials from the Kubeconfig file and adds them for you.
They are k8s’s cloud and credential but not openstack’s.

Also, the cloud type is openstack if it’s a CDK running on openstack but not gce/google (Juju rely on the integrator charm but never talk to google directly for storage and LB provision).


#9

Was the storage class rbd.csi.ceph.com created correctly?
Juju will fallback to the default storage class if the given storage class wasn’t found or wasn’t correctly setup.


#10

From the log provided, it failed at Load Balancer provision.
As said, Juju requires the integrator charm to provision load balancer svc for controller.


#11

Kelvin thanks for all the info!

As for the storage class rbd.csi.ceph.com, it’s created by the K8s charms when related to ceph. I presume it’s “doing the right thing” although I’m not sure how to validate that. That’s actually what I’m wanting to dig into with this environment and if I can deploy a CAAS charm will with storage I’ll see how that goes.

So Juju is choosing LoadBalancer as the service type when on specific cloud types, and that’s what’s failing in the log. The messages about the DNS service then are actually waiting for the LoadBalancer to show up, and when it doesn’t it bails. That seems to cleanly explain what’s going on. Perhaps some additional logging around that creation processes would be good to show the LoadBalancer failure messages would help others in the future.

In an example like the one I have here, where I do not want the K8s cluster exposed publicly what are my options? For example, if the goal is to create a K8s cluster in a private network space to attach to an on-prem network and I do not want Juju to create a LoadBalancer (heck it would be nice if I could stop it from adding public IP’s to the instances as well).

If I add an integrator charm, I have to give Juju permission to use resources and it sounds like it will setup and expose a LoadBalancer with a public IP that I don’t want. Can I then instead add this with another ‘cloud’ specified so that the service type is Daemon or Daemonset so that it doesn’t need or setup the LB? I’ll give that a shot and see what it does, though I’m curious if there’s a recommended way to do it.

If none of the above works, is my only option to setup an integrator, give juju access to my cloud account, and have it setup a LoadBalancer in order to bootstrap onto the cluster? That seems fairly restrictive, I don’t need that to use Helm for example (do I?).


#12

From testing yesterday, it appears that the bootstrap processes requires the cloud native storage. I was able to simply create a PVC and the cluster filled it with a PV from Ceph. However, even after setting the cloud to lxd which bypassed the LoadBalancer requirement the service is still created with a google persistent storage PVC which is never filled.

I’m not actually clear how bootstrapping onto a K8s cluster where I do not provide information about the under-cloud it is running on (other than to tell it this is a LXD cluster) ends up trying to create a google PV. However, this seems to be a hard limitation of the bootstrap processes currently. LoadBalancers and Persistant storage from the under cloud are required for bootstraping a controller onto a K8s cluster.

Can anyone confirm this is by design? I’m curious if this would then mean that, for example, if someone offered a managed K8s cloud which was charm deployed if this means consumers of that K8s are unable to use juju on that cloud due to lack of access to the undercloud.


#13

Hi @chris.sanders
I think openstack integrator supports internal loadbalancer.
But Juju does not support the user to customize the LB annotation for now(potentially we could consider to add this later).

Now If you want make the cluster totally internal facing, I am not sure if there is a config in the integrator to force provision all loabbalancers to internal LBs.


#14

The storage logic here is
user provided storage class(–storage) > CDK labeled storage class juju.io/operator-storage/juju.io/workload-storage > defaultcluster storage class.

I guess your testing failback to the last default cluster storage class I guess.


#15

Hi @chris.sanders
Unfortunately, Juju just needs to know the under cloud type to decide storage and Service Type.
But Juju doesn’t really talk to under cloud directly.


#16

@kelvin.liu I saw the documentation on the priority of storage. I thought by providing the --storage flag with the rbd storage that would be the chosen storage for juju. Isn’t that the top of the list?

This is the default storage, and that works fine when I create a PVC. I’m not clear why using the --storage flag to specify “rbd.csi.ceph.com” doesn’t trump the Google PV and create the controller. Maybe I’m misunderstanding the --storage flag and I need to use another phrase to indicate that I want the ceph created storage to be my user provided storage. It looks like the plumbing is there, I’m just missing something b/c instead of the user provided storage class, it’s moving onto the Google PV (which is not the cluster default either).

You can see the what the cluster has available here: Ubuntu Pastebin
Should I be using the name: “storageclass.storage.k8s.io/ceph-xfs
Instead of the provisioner: “rbd.csi.ceph.com”?


#17

The --storage argument to add-k8s takes the name of the storage class, in this case ceph-xfs.
The provisioner used by the k8s storage class is merely an internal k8s detail which Juju does not have any need to know about.


#18

That explains why the bootstrap was attempting to use the Google Storage. Hopefully I’ll have some time to re-try this and use just the name “ceph-xfs” with the --storage flag I think that’s going to let this bundle deploy w/o use of the integrator charm!