Charmhelpers for K8S Charms

The following is an example of a spark context initialized in a jupyter-notebook cell where the jupyter-notebook runs as a container/juju charm on k8s and executes against the same k8s cluster that it runs on by passing the k8s cluster ip to setMaster and the driver host ip to the spark.driver.host:

from pyspark import SparkConf
from pyspark.sql import SparkSession


conf = SparkConf()\
    .setAppName('JUJU_PI_TEST')\
    .setMaster('k8s://https://10.152.183.1:443')\
    .set('spark.kubernetes.container.image',
         'docker.io/omnivector/spark-2.4.1-hadoop-3.2.0:v1.0.0')\
    .set('spark.driver.host', '10.1.74.16')\
    .set('spark.driver.port', '41049')

spark = SparkSession.builder.config(conf=conf).getOrCreate()

sc = spark.sparkContext

In this configuration the jupyter-notebook (where the spark code is executed) serves as the spark driver host, and thus must pass the host/port details about itself to the spark context configuration so that the executors know who/where to talk back to.

The configs I am focusing on automating are:

    .set('spark.driver.host', '10.1.74.16')

and

    .setMaster('k8s://https://10.152.183.1:443')\

I currently get the config for spark.driver.host by manually running ip a in a notebook cell to get the container ip, and look at kubectl get services to get the CLUSTER-IP to set in setMaster.

I want to discuss how I might be able to track these values via juju and provide them via the notebook container runtime environment such that the user will not have to track them and be responsible for filling these things in in the notebook cell.

The jupyter-k8s charm can be found here.

Setting the $SPARK_DRIVER_BIND_ADDRESS docker env var can be used in place of specifying 'spark.driver.host' in the notebook cell/spark context configuration inline, see pyspark docker image entrypoint.sh. I’m thinking if I can get the ip address of the container via charm code, then I could render the env var into the pod spec.

This leaves me with two solid questions.

From a charm’s perspective:

A) How do I get the ip address of the container the charm is written for (guessing that unit_get('private-address') isn’t yet adapted to work with kubernetes charms)?

B) How can I get the cluster-ip as provided by kubectl get services and/or run administrative operations (kubectl commands) from a charm?

I’m thinking about writing wrappers that would run as charm code on the juju operator pod and return what I want to know about each container/pod from parsing the output of kubectl. Seems a bit hack-y though … I don’t really know.

Looking for some feedback.

Thanks!

the CDK deploys a kubernetes api service at
kubernetes.default.svc.cluster.local:443
so you can just hardcode that

1 Like

you should also be able to do the same for the spark driver ip, during the deployment of the jupyter charm, specify a service type of cluster ip or load balancer, and it will create a service based on your charm and model name so you can hardcode it to something like jupyter-charm-name.k8s-model-name.svc.cluster.local, though port forwarding may be an issue…