.. index::
      pair: Array Jobs; Index

.. _array_jobs:

================
Job Arrays
================

According to the `Slurm Job Array Documentation`_, "job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily."  In general, job arrays are useful for applying the same processing routine to a collection of multiple input data files.  Job arrays offer a very simple way to submit a large number of independent processing jobs.

By submitting a single job array sbatch script, a specified number of "array-tasks" will be created based on this "master" sbatch script.  An example job array script is given below :download:`array.sbatch`:

.. literalinclude:: array.sbatch
  :language: bash

In the above example, The :option:`--array=1-16` option will cause 16 array-tasks (numbered 1, 2, ..., 16) to be spawned when this master job script is submitted.  The "array-tasks" are simply copies of this master script that are automatically submitted to the scheduler on your behalf.
However, in each array-tasks an environment variable called :option:`SLURM_ARRAY_TASK_ID` will be set to a unique value (in this example, a number in the range 1, 2, ..., 16).  In your script, you can use this value to select, for example, a specific data file that each array-tasks will be responsible for processing.

Job array indices can be specified in a number of ways.  For example::

	#A job array with index values between 0 and 31:
	#SBATCH --array=0-31

	#A job array with index values of 1, 2, 5, 19, 27:
	#SBATCH --array=1,2,5,19,27

	#A job array with index values between 1 and 7 with a step size of 2 (i.e. 1, 3, 5, 7):
	#SBATCH --array=1-7:2


The ``%A_%a`` construct in the output and error file names is used to generate unique output and error files based on the master job ID (``%A``) and the array-tasks ID (``%a``).  In this fashion, each array-tasks will be able to write to its own output and error file.

The remaining ``#SBATCH`` options are used to configure each array-tasks.  All of the standard ``#SBATCH`` options are available here.  In the :download:`array.sbatch` example, we are requesting that each array-task be allocated 1 CPU core (:option:`--ntasks=1`) and 4 GB (4000 MB) of memory (:option:`--mem-per-cpu=4000`) in the sandyb partition (:option:`--partition=sandyb`), and be allowed to run for up to 1 hour (:option:`--time=01:00:00`).  To be clear, the overall collection of 16 array-tasks will be allowed to take more than 1 hour to complete, but we have specified that each individual array-task will run for no more than 1 hour.

The total number of array-tasks that are allowed to run in parallel will be governed by the QoS of the partition to which you are submitting.  In most cases, this will limit users to a maximum of 64 concurrently running array-tasks.  To achieve a higher throughput of array-tasks, see :ref:`parallel_batch`

More information about Slurm job arrays can be found in the `Slurm Job Array Documentation`_.

.. _Slurm Job Array Documentation: http://slurm.schedmd.com/job_array.html