This is the first in a series of posts that will detail my journey in provisioning Spark workloads using Juju.
In this post we will cover the very basics of deploying a singleton spark instance that you can use to execute jobs against. In the following posts we will explore different spark cluster modes, backends and different ways of interfacing to the spark backend, whichever it may be.
Spark Standalone Singleton - Up and Running
juju deploy cs:~omnivector/spark --constraints "cores=4 mem=8G root-disk=10G"
Will get you the latest stable version of the Spark charm.
Note: The Spark charm can be deployed with less resources, but will be less usable the further you slim it down.
Following a successful deployment your
juju status should resemble:
Model Controller Cloud/Region Version SLA Timestamp bdx00 pdl-aws aws/us-west-2 2.5.4 unsupported 22:13:39Z App Version Status Scale Charm Store Rev OS Notes spark 2.4.1 active 1 spark jujucharms 31 ubuntu Unit Workload Agent Machine Public address Ports Message spark/0* active idle 1 172.31.102.151 7077/tcp,7078/tcp,8080/tcp,8081/tcp,18080/tcp Running: master,worker,history Machine State DNS Inst id Series AZ Message 1 started 172.31.102.151 i-0ed5ca81962bb9fb2 bionic us-west-2a running
Access the spark GUIs at their respective ports:
- master ui:
- worker ui:
- history server ui:
This juju action command will kick off the
spark-pi example job.
juju run-action spark/0 spark-pi --wait
Inspect the spark master, worker, and history server ui to see the application status after running the
More Than One
To expand the size of the spark cluster simply add more units!
juju add-unit -n 5 spark
Using Juju Storage
A user can provision the
SPARK_LOCAL_DIR via juju storage by passing in the
--storage argument with the correct parameters to the
juju deploy command.
AWS EBS Example
juju deploy cs:~omnivector/spark \ --constraints "spaces=nat root-disk=20G instance-type=t3.xlarge" \ --storage spark-local=ebs,50G --storage spark-work=ebs,100G