Encountering network error while attempting to install Kubernetes via conjure-up

atdhrhs · 1 August 2019 05:54

I am experiencing the same issue as https://github.com/jetstack/cert-manager/issues/641#issuecomment-399999436 , i don’t know how to do this after conjure-up (which uses juju) kubernetes-core deployment

All of my pods can’t communicate with any HTTPs, my cert-manager gets cert-manager/controller/clusterissuers "msg"="error setting up issuer" "error"="Get https://acme-staging-v02.api.letsencrypt.org/directory: x509: certificate is valid for ingress.local, not acme-staging-v02.api.letsencrypt.org" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"="" and my other pods are also reporting similar issue like https://github.com/...: x509: certificate is valid for ingress.local

Can anyone help please?

timClicks · 1 August 2019 20:42

Copying some of the backlog from the #juju channel on freenode. We need to figure out how to ask Juju to tell the k8s apiserver that its configuration settings should be changed.

[14:08:19]<kelvinliu_> it's like you need customise some options for k8s api-server,
[14:10:38]<atdprhs> kelvinliu_I am already checking with them but no answer, but as far as I know, conjure-up is using juju
[14:10:57]<atdprhs> so my best guess on such issue, it needs juju involvement
[14:11:41]<kelvinliu_> it's more like you need config the deployment.
[14:15:47]<kelvinliu_> i m not sure if u can find the config option from here, https://jaas.ai/u/containers/kubernetes-master
[14:17:06]<kelvinliu_> juju config kubernetes-master apiserver-cert-extra-xxxx=xxxxx
[14:17:28]<kelvinliu_> u just need to set the config like this
[14:20:33]<atdprhs> Could this help with the DNS issue?
[14:22:53]<kelvinliu_> from the link u give me, they fix it by customising the api-server option.
[14:24:23]<atdprhs> yes, I see `kubeadm init --apiserver-cert-extra-sans="mydomainhere.com" --pod-network-cidr="10.244.0.0/16" --service-cidr="10.96.0.0/12" --apiserver-advertise-address="0.0.0.0"`
[14:24:45]<atdprhs> I don't know how or to what I configure `--pod-network-cidr="10.244.0.0/16" --service-cidr="10.96.0.0/12"`
[14:25:03]<kelvinliu_> so it's not an issue with juju at all,
[14:25:56]<atdprhs> On kubernetes chat, I have received a response from one of the guys there `I used conjure-up to deploy my k8s and use cert manager. What is wrong that you're trying to fix here? Do you have the same issue as the bug? Do you know what is actually happening to get an odd cert like that? It looks like the solution was just to change or define network
[14:25:56]<atdprhs> stuff and extra sans. You can do all that with juju, but shouldn't have to do it.`
[14:26:04]<atdprhs> This guy is currently offline
[14:26:09]<kelvinliu_> as i just said, u will need find the relevant options in the doc of kubernetes master then run the cmd above to config it
[14:26:18]<atdprhs> but based on him, it look like it's all juju
[14:27:21]<atdprhs> From the document you sent `DNS for the cluster` might help I guess as I know it's DNS issue, cuz all of my pods can't communicate with any HTTPs, my cert-manager gets `cert-manager/controller/clusterissuers "msg"="error setting up issuer" "error"="Get https://acme-staging-v02.api.letsencrypt.org/directory: x509: certificate is valid for ingress.lo
[14:27:22]<atdprhs> cal, not acme-staging-v02.api.letsencrypt.org" "resource_kind"="ClusterIssuer" "resource_name"="letsencrypt-staging" "resource_namespace"=""` and my other pods are also reporting similar issue like `https://github.com/...: x509: certificate is valid for ingress.local`
[14:27:46]* veebers has quit (Quit: veebers)
[14:31:35]* veebers (~veebers@210.54.38.249) has joined
[14:33:10]<kelvinliu_> sorry, im not an expert of k8s api-server, it's better to wait him online or ask others in k8s channel.>

timClicks · 1 August 2019 20:46

Pinging @adam-stokes, @cory_fu. What is the suggested approach for passing config values through to a charm/application?

cory_fu · 7 August 2019 14:09

Apologies for the delayed response; I was travelling.

From the error you’re getting, it does look like the only thing you hopefully should need should would be the extra_sans config option on the kubernetes-master charm. If you do end up needing to tweak the other options manually, then there is the service-cidr option on kubernetes-master and, while there is not an explicit option for the --pod-network-cidr param, you can provide that to kubernetes-worker via the kubelet-extra-args config option.

All of these options can be configured during deployment with conjure-up by clicking the Configure button next to the relevant charm on the Configure Applications screen. The extra_sans option can be changed after deployment with juju config (which is what conjure-up drives in the background and provides an interactive walk-through experience for), but the other two options must be set at deployment time, so would need to either be set via conjure-up or, if deploying manually, provided as a bundle overlay when calling juju deploy.

knobby · 7 August 2019 14:35

I would point out that the issue seems to be networking. Specifically, DNS is resolving to intra-cluster addresses for that error. I wonder if there is something like a service with the name of the external address as the “bug” was fixed by removing the ndots search in the host file. I would start up an ubuntu pod and poke around with name resolution and see what is going on.

kubectl run test -it --rm --image ubuntu -- bash

Then inside there, do things like dig acme-v01.api.letsencrypt.org and curl acme-v01.api.letsencrypt.org and see what you can learn.

atdhrhs · 23 August 2019 14:06

$ kubectl run test -it --rm --image ubuntu -- bash
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
root@test-654cdfc5d5-4tw8k:/# dig acme-v01.api.letsencrypt.org and curl acme-v01.api.letsencrypt.org
bash: dig: command not found
root@test-654cdfc5d5-4tw8k:/# dig google.com @8.8.8.8
bash: dig: command not found
root@test-654cdfc5d5-4tw8k:/#

knobby · 23 August 2019 15:16

We spoke some on IRC about this and I suggested you to install the dig utility from the apt package dnsutils. This resulted in:

# apt install dnsutils
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package dnsutils
root@test-654cdfc5d5-4tw8k:/# apt update
Ign:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
Ign:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Ign:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Err:4 http://security.ubuntu.com/ubuntu bionic-security Release
  404  Not Found [IP: <my.ip> 80]
Ign:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Err:6 http://archive.ubuntu.com/ubuntu bionic Release
  404  Not Found [IP: <my.ip> 80]
Err:7 http://archive.ubuntu.com/ubuntu bionic-updates Release
  404  Not Found [IP: <my.ip> 80]
Err:8 http://archive.ubuntu.com/ubuntu bionic-backports Release
  404  Not Found [IP: <my.ip> 80]
Reading package lists... Done
E: The repository 'http://security.ubuntu.com/ubuntu bionic-security Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu bionic Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-updates Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: The repository 'http://archive.ubuntu.com/ubuntu bionic-backports Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

This indicates to me a general DNS issue inside the pod as you reported. DNS should hit the coredns pod, which defers to your dns server for anything outside the cluster. I find it interesting that kubelet could download the ubuntu image, but then internally you can’t resolve DNS.

From here, I would be interested in looking at the running pod status and the logs of the coredns pods.

kubectl get po -A
and then for each coredns pod:
kubectl logs -n kube-system po/coredns-<hash>

It would also be good to see a juju status

atdhrhs · 23 August 2019 16:35

1st coredns:

.:53
2019-08-23T12:19:58.692Z [INFO] plugin/reload: Running configuration MD5 = 237092302b89f9cda6678c240e5e171c
2019-08-23T12:19:58.692Z [INFO] CoreDNS-1.5.1
2019-08-23T12:19:58.692Z [INFO] linux/amd64, go1.12.6, 6c33397
CoreDNS-1.5.1
linux/amd64, go1.12.6, 6c33397
[INFO] Reloading
2019-08-23T12:22:51.053Z [INFO] plugin/reload: Running configuration MD5 = 76dd40a8d85f49bc080c15939532be01
[INFO] Reloading complete

2nd coredns:

.:53
2019-08-23T12:19:58.421Z [INFO] plugin/reload: Running configuration MD5 = 237092302b89f9cda6678c240e5e171c
2019-08-23T12:19:58.421Z [INFO] CoreDNS-1.5.1
2019-08-23T12:19:58.421Z [INFO] linux/amd64, go1.12.6, 6c33397
CoreDNS-1.5.1
linux/amd64, go1.12.6, 6c33397
[INFO] Reloading
2019-08-23T12:21:41.877Z [INFO] plugin/reload: Running configuration MD5 = 76dd40a8d85f49bc080c15939532be01
[INFO] Reloading complete

juju status:

$ juju status
Model                         Controller                    Cloud/Region   Version  SLA          Timestamp
conjure-charmed-kubernet-xyz  conjure-up-region-xyz            regionX-01  2.6.5    unsupported  02:28:30+10:00

App                    Version  Status  Scale  Charm                  Store       Rev  OS      Notes
containerd                      active      2  containerd             jujucharms   20  ubuntu
easyrsa                3.0.1    active      1  easyrsa                jujucharms  270  ubuntu
etcd                   3.2.10   active      1  etcd                   jujucharms  449  ubuntu
flannel                0.10.0   active      2  flannel                jujucharms  438  ubuntu
kubeapi-load-balancer  1.14.0   active      1  kubeapi-load-balancer  jujucharms  664  ubuntu  exposed
kubernetes-master      1.15.2   active      1  kubernetes-master      jujucharms  724  ubuntu
kubernetes-worker      1.15.2   active      1  kubernetes-worker      jujucharms  571  ubuntu  exposed

Unit                      Workload  Agent  Machine  Public address  Ports           Message
easyrsa/0*                active    idle   0        k8s.easyrsaip                   Certificate Authority connected.
etcd/0*                   active    idle   1        k8s.etcd.ip     2379/tcp        Healthy with 1 known peer
kubeapi-load-balancer/0*  active    idle   2        k8s.kubeap.ip   443/tcp         Loadbalancer ready.
kubernetes-master/0*      active    idle   3        k8s.master.ip   6443/tcp        Kubernetes master running.
  containerd/1*           active    idle            k8s.master.ip                   Container runtime available.
  flannel/1*              active    idle            k8s.master.ip                   Flannel subnet 10.1.66.1/24
kubernetes-worker/0*      active    idle   4        k8s.worker.ip   80/tcp,443/tcp  Kubernetes worker running.
  containerd/0            active    idle            k8s.worker.ip                   Container runtime available.
  flannel/0               active    idle            k8s.worker.ip                   Flannel subnet 10.1.12.1/24

Machine  State    DNS            Inst id       Series  AZ       Message
0        started  k8s.easyrsaip  k8s.easyrsa    bionic  default  Deployed
1        started  k8s.etcd.ip    k8s.etcd     bionic  default  Deployed
2        started  k8s.kubeap.ip  k8s.kubeap   bionic  default  Deployed
3        started  k8s.master.ip  k8s.master   bionic  default  Deployed
4        started  k8s.worker.ip  k8s.worker  bionic  default  Deployed

atdhrhs · 23 August 2019 18:17

Thanks a lot @knoppy, you really made my day!!! As discovered, it was the search that’s being set by maas that was the root cause of why the DNS was failing… I’ll create another post on maas and mention you in it

knobby · 27 August 2019 13:59

For anyone following along at home, there was a default search that included a personal domain name and the dns server was set to resolve *.domain.com to the an address. This domain was added to MaaS and as a result the resolv.conf had “search domain.com” in it. This in turn meant a dig www.google.com resulted in a resolve request for www.google.com.domain.com, which was happily resolved to that wildcard address.

To confuse matters some more, there were 3 upstream DNS servers, so depending on which one you asked you either got the proper IP or the wildcard IP.