The following is an example of a spark context initialized in a jupyter-notebook cell where the jupyter-notebook runs as a container/juju charm on k8s and executes against the same k8s cluster that it runs on by passing the k8s cluster ip to
setMaster and the driver host ip to the
from pyspark import SparkConf from pyspark.sql import SparkSession conf = SparkConf()\ .setAppName('JUJU_PI_TEST')\ .setMaster('k8s://https://10.152.183.1:443')\ .set('spark.kubernetes.container.image', 'docker.io/omnivector/spark-2.4.1-hadoop-3.2.0:v1.0.0')\ .set('spark.driver.host', '10.1.74.16')\ .set('spark.driver.port', '41049') spark = SparkSession.builder.config(conf=conf).getOrCreate() sc = spark.sparkContext
In this configuration the jupyter-notebook (where the spark code is executed) serves as the spark driver host, and thus must pass the host/port details about itself to the spark context configuration so that the executors know who/where to talk back to.
The configs I am focusing on automating are:
I currently get the config for
spark.driver.host by manually running
ip a in a notebook cell to get the container ip, and look at
kubectl get services to get the
CLUSTER-IP to set in
I want to discuss how I might be able to track these values via juju and provide them via the notebook container runtime environment such that the user will not have to track them and be responsible for filling these things in in the notebook cell.
The jupyter-k8s charm can be found here.
$SPARK_DRIVER_BIND_ADDRESS docker env var can be used in place of specifying
'spark.driver.host' in the notebook cell/spark context configuration inline, see pyspark docker image entrypoint.sh. I’m thinking if I can get the ip address of the container via charm code, then I could render the env var into the pod spec.
This leaves me with two solid questions.
From a charm’s perspective:
A) How do I get the ip address of the container the charm is written for (guessing that
unit_get('private-address') isn’t yet adapted to work with kubernetes charms)?
B) How can I get the
cluster-ip as provided by
kubectl get services and/or run administrative operations (
kubectl commands) from a charm?
I’m thinking about writing wrappers that would run as charm code on the juju operator pod and return what I want to know about each container/pod from parsing the output of
kubectl. Seems a bit hack-y though … I don’t really know.
Looking for some feedback.