Allow `hostname` in K8S Yaml

jamesbeedy · 6 October 2019 00:29

Running into a few situations where it would be really handy to be able to set hostname in the pod spec.

@wallyworld Is this something we can easily add to the supported features?

wallyworld · 7 October 2019 05:38

So there’s a tension here between:

what do we need to do now to support existing upstream images
enforce the adoption of the Juju way and require upstream changes

It seems that k8s workloads use DNS names as a form of service discovery mechanism. And the reason you’d want to set the hostname is so that some other workload can find you using that “well known” name.

Of course, the Juju way to do this is using a relation to pass data between applications. I’m guessing you’re trying to consume upstream OCI images directly and these use DNS names? Is it an option for you to adopt the use of Juju relations?

An issue with these type of mechanisms that require the use of some hard coded name outside of workload config is that the approach doesn’t scale so well etc etc. There’s certainly a trade off between allowing Juju to be used today whilst catering for existing practices, in the expectation that things will move to the Juju way over time; this unblocks the use of Juju and allows all the good things it brings to be immediately adopted, but at the cost of possible (likely) enshrining the use of non-Juju practices that will become difficult/impossible to get away from.

Trying to grow an ecosystem around the new kid on the block (Juju) when upstream is catering for the status quo (eg helm) is difficult and requires this balancing act. So every concession to a non-Juju approach requires thought. For truly k8 specific things like custom resource definitions, or k8s specific pod features live readiness probes etc, we’ve put in a separate “kubernetes” section in the YAML the charm produces. Do we do the same with “hostname” (and “subdomain” for that matter)? Can we push back and see if we can adopt a more Juju centric approach?

jamesbeedy · 7 October 2019 14:34

I am modeling SLURM on k8s using Juju. I think for my situation its more a matter of config nuances only allowing for the equivalent of the return of hostname -s to be used in the machine identifier config slurm config example

SLURM config ref

I need to get the container hostname into the config of the app running inside of itself before the entrypoint executes.

This was a challenge due to the make_pod_spec() needing to run in order for the container hostname to be generated.

Just a thought - it may be useful in this situation (and generally) to have a charm helper that allows a command to be executed inside a running container.

Possibly ^ could be a step in facilitating my use case here.

Onward.

I initially thought using the container ip would suffice in the config, so I make a call make pod spec to start the container, get the network info, then another call make_pod_spec again to set the ip of the container into its config … barf. Here is the code, which worked, and would have worked for my use case if my service was capable of taking an ip address for a config! BLAST!

I’ve tried going at it from another angle too - a myriad of different approaches inside of the entrypoint with no success. See one below:

HOSTNAME=`hostname -f`
awk -v var="SlurmctldHost" -v new_val=$HOSTNAME 'BEGIN{FS=OFS="="}match($1, "^\\s*" var "\\s*") {$2="" new_val}1' /mnt/slurm-init.conf  > /etc/slurm/slurm.conf

I feel like I’m getting super close with my hacks^ but it would be nice to have a more supported way of going about this. Possibly the answer here is to have a helper that allows juju application operator charms to execute code inside of running workload containers?

jamesbeedy · 7 October 2019 18:08

The solution I came up with to manage this: 1) here, and 2) here.

The main idea is that the charm writes the config to /mnt, and the docker entrypoint modifies the config, substituting in the hostname, then writes the config to its target/final location.

Thoughts?

wallyworld · 8 October 2019 00:05

Seems like it could work. I guess you could use /tmp instead of /mnt? Or could the charm write the config to its intended location and simply have the entry point modify insitu?

A key issue with hard coding is that you can’t deploy the same charm twice into a model. So a solution like this which looks at the hostname at runtime is preferrable IMO.

As an aside - it seems that many k8s workloads suffer from this scalability issue? ie they use a hard coded DNS name or config map or some other k8s resource. That’s fine if there’s just one of them in a namespace, but means you can’t deploy > 1 of something, right? I guess it comes down to more of a social convention? ie don’t do “that” because it will break. Using Juju relations would go a long way to helping solve such issues but needs fundamental changes to many upstream images.

jamesbeedy · 8 October 2019 00:39

My thoughts here were that the charm code could be responsible for generating a prefix/suffix to be used in combination with application name and unit # whereas the charm author could generate the naming convention in a predictable way.

Honestly I don’t like any of it - putting this much emphasis/effort into creating pet containers feels like the wrong thing do to for sure. On the other hand, giving users/charm authors the capability to facilitate the use case seems like the right thing.

wallyworld · 8 October 2019 00:57

I agree this is not something we should want to do. I am assuming that the need to do it is because of how the upstream image has been architected right? But it’s a chicken/egg thing to change it. If the image needs to work in a way that is compatible with the other images with which it needs to collaborate, then it can’t be changed in isolation to do things in a better way.

It’s also hard to use a small sample size to make informed decisions about such things. So if there’s a work around for now, that’s great. We can continue to look at new workloads being charmed and where the gaps are and make decisions once we have more data.

jamesbeedy · 8 October 2019 01:33

While we are on the topic, @wallyworld do you know why it is that some pods get named with what seems to be a uuid, and others get named with <application-name>-<unit-number> convention?

slurmd and mysql pods get the <application-name>-<unit-number> while slurmctld and slurmdbd get the <application-name>-<uuid>.

$ microk8s.kubectl get pods --namespace bdx
NAME                         READY   STATUS             RESTARTS   AGE
mysql-0                      1/1     Running            0          2m26s
mysql-operator-0             1/1     Running            0          2m43s
slurmctld-6676b65659-ggkcr   1/1     Running            0          105s
slurmctld-operator-0         1/1     Running            0          2m38s
slurmd-0                     0/1     CrashLoopBackOff   3          97s
slurmd-operator-0            1/1     Running            0          2m32s
slurmdbd-79487ff874-lwmct    1/1     Running            0          119s
slurmdbd-operator-0          1/1     Running            0          2m28s

wallyworld · 8 October 2019 01:42

There’s 2 ways k8s manages pods - stateful set or deployment controller. With the former, the pod names need to be stable so k8s can properly manage pod restarts. For the latter, k8s just assigns UUIDs.

Juju will use a stateful set where a charm is deployed with storage. You can also force the use of a stateful set for (stateless) charms without storage by adding the deployment block to the charm metadata yaml, eg

deployment:
  type: stateful