.. index:: 
  single: Spark

.. _mdoc_spark:

===========
Spark
===========

Apache Spark is a fast and general engine for large-scale data processing.  It
has a Scala, Java, and Python API and can be run either on either a single node
or multi-node configuration. For both cases, it is recommended to have exclusive
access of the node in Slurm.

Single Node Examples
====================

Here is the SparkPi and pi.py examples from the Spark distribution running on
a single node:

sbatch script :download:`spark-single-node.sbatch`

.. literalinclude:: spark-single-node.sbatch
     :language: bash

Multi-node Examples
===================

For multi-node Spark jobs, a helper script was written to launch the master
and work tasks in the slurm allocation. Here are the same examples as above,
but with Spark running on multiple nodes:


sbatch script :download:`spark-multi-node.sbatch`

.. literalinclude:: spark-multi-node.sbatch
     :language: bash