[Tutorial] Spark Workloads Using Juju - Part 1

This is the first in a series of posts that will detail my journey in provisioning Spark workloads using Juju.

In this post we will cover the very basics of deploying a singleton spark instance that you can use to execute jobs against. In the following posts we will explore different spark cluster modes, backends and different ways of interfacing to the spark backend, whichever it may be.

Spark Standalone Singleton - Up and Running

juju deploy cs:~omnivector/spark --constraints "cores=4 mem=8G root-disk=10G"

Will get you the latest stable version of the Spark charm.
Note: The Spark charm can be deployed with less resources, but will be less usable the further you slim it down.

Following a successful deployment your juju status should resemble:

Model  Controller  Cloud/Region   Version  SLA          Timestamp
bdx00  pdl-aws     aws/us-west-2  2.5.4    unsupported  22:13:39Z

App               Version  Status   Scale  Charm             Store       Rev  OS      Notes
spark             2.4.1    active       1  spark             jujucharms   31  ubuntu

Unit                 Workload  Agent  Machine  Public address  Ports                                          Message
spark/0*             active    idle   1        172.31.102.151  7077/tcp,7078/tcp,8080/tcp,8081/tcp,18080/tcp  Running: master,worker,history

Machine  State    DNS             Inst id              Series  AZ          Message
1        started  172.31.102.151  i-0ed5ca81962bb9fb2  bionic  us-west-2a  running

Access the spark GUIs at their respective ports:

  • master ui: http://172.31.102.151:8080
  • worker ui: http://172.31.102.151:8081
  • history server ui: http://172.31.102.151:18080

Run the spark-pi Example

This juju action command will kick off the spark-pi example job.

juju run-action spark/0 spark-pi --wait

Inspect the spark master, worker, and history server ui to see the application status after running the spark-pi action.

More Than One

To expand the size of the spark cluster simply add more units!

juju add-unit -n 5 spark

Using Juju Storage

A user can provision the SPARK_WORKER_DIRS and/or SPARK_LOCAL_DIR via juju storage by passing in the --storage argument with the correct parameters to the juju deploy command.

AWS EBS Example
 juju deploy cs:~omnivector/spark \
    --constraints "spaces=nat root-disk=20G instance-type=t3.xlarge" \
    --storage spark-local=ebs,50G --storage spark-work=ebs,100G
1 Like

I know at least one of my coworkers that will be interested in this.

  • Consider changing title to [Tutorial] to make tho post strand out also.