The kubeflow charms from the juju-solutions repo have been uploaded to the charm store under the ~juju
namespace.
Overview
Kubeflow is a collection of a few loosely related components that provide easy ways of running machine learning code in various forms within Kubernetes. JupyterHub provides an interactive notebook interface with TensorFlow and other libraries pre-installed, TensorFlow Training or PyTorch Training allow you to train models which can then be served with either TensorFlow Serving or Seldon, and TensorFlow Dashboard provides a management interface for TensorFlow Training or Serving jobs.
The Juju Kubeflow Charms provide an alternative to Ksonnet for deploying Kubeflow.
Prerequisites
You will need any Kubernetes cluster, plus a Juju 2.5 controller (2.5 rc1 or better).
Two suggested configurations are: a LXD controller with microk8s, or Kubernetes deployed to a cloud with the integrator charm for that cloud.
Deploying Kubeflow
The easiest way to get things going is to deploy the kubeflow bundle.
juju deploy cs:kubeflow
The following is a list of all the available Kubeflow charms:
cs:~juju/kubeflow-tf-hub JupyterHub with Kubeflow libraries and settings
cs:~juju/kubeflow-tf-job-dashboard TensorFlow Dashboard
cs:~juju/kubeflow-tf-serving TensorFlow Serving
cs:~juju/kubeflow-seldon-cluster-manager Seldon Serving
cs:~juju/kubeflow-seldon-api-frontend Seldon Serving Frontend
cs:~juju/kubeflow-tf-job-operator TensorFlow Training
cs:~juju/kubeflow-pytorch-operator PyTorch Training
cs:~juju/kubeflow-ambassador Ambassador API Gateway
Some of the charms have additional options you may want to set, in particular the notebook-storage-size option for cs:~juju/kubeflow-tf-hub to attach persistent storage for the notebooks. These options can be set in the bundle or via the deploy CLI.
juju deploy cs:~juju/kubeflow-tf-hub
--config notebook-storage-size=10Mi
Note: TensorFlow Serving is a special case, in that you will deploy a separate instance of the charm for each model you wish to serve, and provide that model via either a resource or charm config as a URL.
Note: depending on the k8s cluster and the undercloud, you may also need to deploy the Hub with a LoadBalancer service in order to allow ingress. This is not necessary for CDK deployed on AWS with the integrator charm for example.
juju deploy cs:~juju/kubeflow-tf-hub
--config notebook-storage-size=10Mi
--config kubernetes-service-type=LoadBalancer
Using Kubeflow
Once deployed, go to the JupyterHub service endpoint and login with any username and password and click the Start My Server button. You can run juju status
on to see the JupyterHub application address to get the IP address to which to connect. The port is 8000.
If desired, use the form to choose the version, CPU / GPU, or memory resources. If the form is left blank, the latest version and reasonable defaults will be used. Then click Submit.
A new Jupyter Notebook pod will be created for your user (with persistent storage attached, if configured) and you will be taken to a file browser interface.
Select New -> Notebook from the top right to create a notebook.
You can then run some ML code. The example from the Kubeflow User Guide is:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
When run, this should result in something around 0.9014 being printed as the calculated accuracy.
TensorFlow Training
To submit models to be trained, you must create a TFJob custom resource in Kubernetes. For example, to submit the distributed mnist model, which is used for e2e testing, you can need to follow the instructions here to build then docker image locally:
https://github.com/kubeflow/tf-operator/tree/master/examples/v1alpha2/dist-mnist
Then:
kubectl create -n $namespace -f https://raw.githubusercontent.com/kubeflow/tf-operator/master/examples/v1alpha2/dist-mnist/tf_job_mnist.yaml
Note: The namespace is the name of the Kubernetes model in Juju that this charm is deployed into.
You can then check on the status of the job via either the TensorFlow Dashboard, or kubectl:
kubectl get -o yaml -n $namespace tfjobs dist-mnist-for-e2e-test
PyTorch Training
To submit models to be trained, you must create a PyTorchJob custom resource in Kubernetes. For example, to submit the distributed mnist model, which is used for e2e testing, you can use:
kubectl create -n $namespace -f https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/mpi-dist/mnist/cpu/v1beta1/mpi_mnist_job_cpu.yaml
More details about setting up pytorch jobs are found here:
https://github.com/kubeflow/pytorch-operator
Note: The namespace is the name of the Kubernetes model in Juju that this charm is deployed into.
You can then check the status of the job via either the TensorFlow Dashboard, or kubectl:
kubectl get -o yaml -n $namespace pytorchjobs dist-mnist-for-e2e-test
TensorFlow Serving
A separate instance of this charm should be deployed for each model to serve, with the model being provided as either a URL in charm config, or via a resource. For example, to serve the inception model, you would deploy it as:
juju deploy cs:~juju/kubeflow-tf-serving inception \
--config model=gs://kubeflow-models/inception
You would then point the inception_client to port 9000 on the LB address.
Seldon Serving
This charm must be deployed to a Kubernetes model in Juju and related to redis:
juju deploy cs:~juju/kubeflow-seldon-cluster-manager
juju deploy cs:~juju/redis-k8s
juju add-relation kubeflow-seldon-cluster-manager redis-k8s
To submit models to be trained or served, you must create a SeldonDeployment custom resource. Currently, the custom resource definition for this must be loaded manually via:
kubectl create -n $juju_model_name -f https://raw.githubusercontent.com/juju-solutions/charm-kubeflow-seldon-cluster-manager/start/files/crd-v1alpha1.yaml
The specific SeldonDeployment that you create will depend on how and what image you are wanting to serve, but a simple example might look like:
apiVersion: machinelearning.seldon.io/v1alpha1
kind: SeldonDeployment
metadata:
labels:
app: seldon
name: mymodel
namespace: default
spec:
annotations:
deployment_version: v1
project_name: mymodel
name: mymodel
predictors:
- annotations:
predictor_version: v1
componentSpec:
spec:
containers:
- image: seldonio/mock_classifier:1.0
imagePullPolicy: Always
name: mymodel
volumeMounts: []
terminationGracePeriodSeconds: 1
volumes: []
graph:
children: []
endpoint:
type: REST
name: mymodel
type: MODEL
name: mymodel
replicas: 1