MPI Jobs

For more information about which MPI libraries are available on Midway1 and Midway2, see Message Passing Interface (MPI).

Let’s look at an example MPI hello world program and explain the steps needed to compile and submit it to the queue.

Here is the example MPI hello world program: hello-mpi.c

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char *argv[], char *envp[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

  MPI_Finalize();
}

Place hello-mpi.c in your home directory and compile this program interactively by entering the following commands into a terminal on either Midway1 or Midway2 login nodes:

module load openmpi
mpicc hello-mpi.c -o hello-mpi

If you choose hello-mpi_midway1.sbatch to submit your job to Midway1, you have to run the above commands on one of Midway1 login nodes. Alternatively, you have to run the above commands on one of Midway2 login nodes if you choose hello-mpi_midway2.sbatch to submit your job to Midway2.

The reason that we can run the same commands on both Midway1 and Midway2 login nodes is that we are using the default version of the OpenMPI module which defaults to the system GCC compiler. Please note that the default version of a module on Midway1 and Midway2 could be different. For example, the default version of the OpenMPI module on Midway1 is 1.6 whereas the default version of the OpenMPI module on Midway2 is 2.0.1. It should be possible to use any available MPI compiler to compile and run this example.

hello-mpi_midway1.sbatch is a submission script that can be used to submit a job to Midway1 to run the hello-mpi program:

#!/bin/bash
# a sample job submission script to submit an MPI job to the sandyb partition on Midway1
# please change the --partition option if you want to use another partition on Midway1

# set the job name to hello-mpi
#SBATCH --job-name=hello-mpi

# send output to hello-mpi.out
#SBATCH --output=hello-mpi.out

# receive an email when job starts, ends, and fails
#SBATCH --mail-type=BEGIN,END,DAIL

# this job requests 32 cores. Cores can be selected from various nodes.
#SBATCH --ntasks=32

# there are many partitions on Midway1 and it is important to specify which
# partition you want to run your job on. Not having the following option, the
# sandby partition on Midway1 will be selected as the default partition
#SBATCH --partition=sandyb

# load the openmpi module
module load openmpi

# Run the process with mpirun. Notice -n is not required. mpirun will
# automatically figure out how many processes to run from the slurm options
mpirun ./hello-mpi

hello-mpi_midway2.sbatch is a submission script that can be used to submit a job to Midway2 to run the hello-mpi program:

#!/bin/bash
# a sample job submission script to submit an MPI job to the broadwl partition on Midway2

# set the job name to hello-mpi
#SBATCH --job-name=hello-mpi

# send output to hello-mpi.out
#SBATCH --output=hello-mpi.out

# receive an email when job starts, ends, and fails
#SBATCH --mail-type=BEGIN,END,DAIL

# this job requests 32 cores. Cores can be selected from various nodes.
#SBATCH --ntasks=32

# there are a few partitions on Midway2 and it is important to specify which
# partition you want to run your job on. Not having the following line, the 
# sandby partition on Midway1 will be selected as the default partition
#SBATCH --partition=broadwl

# the --constraint=fdr or --constraint=edr options (available on Midway2 only) 
# could be given to guarantee a job will be using nodes with the FDR or EDR 
# Infiniband interconnects. Without this option, the scheduler will select 
# available nodes without considering their interconnect types. 
#SBATCH --constraint=fdr

# load the openmpi module
module load openmpi

# Run the process with mpirun. Notice -n is not required. mpirun will
# automatically figure out how many processes to run from the slurm options
mpirun ./hello-mpi

Note: Midway1 and Midway2 have different set of modules. Please make sure you use the correct module name and version when submitting your job to each cluster.

The inline comments describe what each line does, but it is important to emphesize the following points for MPI jobs:

  • The --constraint=fdr or --constraint=edr options are only availble on Midway2 and using them on Midway1 will result in a job submission error. MPI jobs submitted without this option on Midway2 could run on nodes with FDR, EDR, or combination of both.
  • The --partition option will determine whether your job runs on Midway1 or Midway2. If you do not have this option in your job submission script, your job will be submitted to the sandyb partition on Midway1.
  • mpirun does not need to be given -n. All supported MPI environments automatically determine the proper layout based on the Slurm options

You can submit hello-mpi_midway1.sbatch using the following command from one of Midway1 login nodes to Midway1:

sbatch hello-mpi_midway1.sbatch

Alternatively, you can submit hello-mpi_midway2.sbatch using the following command from one of Midway2 login nodes to Midway2:

sbatch hello-mpi_midway2.sbatch

Here is an example output of this program submitted to the sandyb partition on Midway1 (please note that MPI processes can run on any node that has availble cores and memory):

Process 4 on midway100 out of 32
Process 0 on midway100 out of 32
Process 1 on midway103 out of 32
Process 2 on midway103 out of 32
Process 5 on midway223 out of 32
Process 15 on midway223 out of 32
Process 12 on midway443 out of 32
Process 7 on midway223 out of 32
Process 9 on midway353 out of 32
Process 14 on midway353 out of 32
Process 8 on midway323 out of 32
Process 24 on midway323 out of 32
Process 10 on midway323 out of 32
Process 11 on midway323 out of 32
Process 3 on midway081 out of 32
Process 6 on midway304 out of 32
Process 13 on midway304 out of 32
Process 17 on midway194 out of 32
Process 20 on midway194 out of 32
Process 19 on midway082 out of 32
Process 25 on midway082 out of 32
Process 27 on midway078 out of 32
Process 26 on midway078 out of 32
Process 29 on midway222 out of 32
Process 28 on midway222 out of 32
Process 31 on midway204 out of 32
Process 30 on midway204 out of 32
Process 18 on midway204 out of 32
Process 22 on midway312 out of 32
Process 21 on midway312 out of 32
Process 23 on midway420 out of 32
Process 16 on midway420 out of 32

It is possible to control the number of tasks run per node with the --ntasks-per-node option. Submitting the job like this to Midway1:

sbatch --ntasks-per-node=1 hello-mpi_midway1.sbatch

Results in an output like this (each MPI process from your job will run on a different node):

Process 4 on midway100 out of 32
Process 0 on midway101 out of 32
Process 1 on midway103 out of 32
Process 2 on midway104 out of 32
Process 5 on midway223 out of 32
Process 15 on midway224 out of 32
Process 12 on midway443 out of 32
Process 7 on midway225 out of 32
Process 9 on midway353 out of 32
Process 14 on midway354 out of 32
Process 8 on midway324 out of 32
Process 24 on midway325 out of 32
Process 10 on midway326 out of 32
Process 11 on midway427 out of 32
Process 3 on midway423 out of 32
Process 6 on midway304 out of 32
Process 13 on midway305 out of 32
Process 17 on midway194 out of 32
Process 20 on midway195 out of 32
Process 19 on midway082 out of 32
Process 25 on midway083 out of 32
Process 27 on midway078 out of 32
Process 26 on midway079 out of 32
Process 29 on midway222 out of 32
Process 28 on midway193 out of 32
Process 31 on midway204 out of 32
Process 30 on midway205 out of 32
Process 18 on midway206 out of 32
Process 22 on midway312 out of 32
Process 21 on midway313 out of 32
Process 23 on midway420 out of 32
Process 16 on midway421 out of 32

Advanced Usage

Both OpenMPI and IntelMPI have the ability to launch MPI programs directly with the Slurm command srun. It is not necessary to use this mode for most jobs, but it may allow job launch options that would not otherwise be possible. For example, from a Midway1 login node it is possible to launch the above hello-mpi program using OpenMPI to run using 16 MPI processes in the sandyb partition with this command:

srun -n16 hello-mpi

For IntelMPI, it is necessary to set an environment variable for this to work:

export I_MPI_PMI_LIBRARY=/software/slurm-current-$DISTARCH/lib/libpmi.so
srun -n16 hello-mpi

If you want to submit an MPI job to Midway2 using OpenMPI with srun, use the following command:

srun -n16 --partition=broadwl hello-mpi

For IntelMPI on Midway2, you need to set the I_MPI_PMI_LIBRARY variable and then run srun:

export I_MPI_PMI_LIBRARY=/software/slurm-current-$DISTARCH/lib/libpmi.so
srun -n16 --partition=broadwl hello-mpi