.. index:: single: Spark .. _mdoc_spark: =========== Spark =========== Apache Spark is a fast and general engine for large-scale data processing. It has a Scala, Java, and Python API and can be run either on either a single node or multi-node configuration. For both cases, it is recommended to have exclusive access of the node in Slurm. Single Node Examples ==================== Here is the SparkPi and pi.py examples from the Spark distribution running on a single node: sbatch script :download:`spark-single-node.sbatch` .. literalinclude:: spark-single-node.sbatch :language: bash Multi-node Examples =================== For multi-node Spark jobs, a helper script was written to launch the master and work tasks in the slurm allocation. Here are the same examples as above, but with Spark running on multiple nodes: sbatch script :download:`spark-multi-node.sbatch` .. literalinclude:: spark-multi-node.sbatch :language: bash