Hybrid MPI/OpenMP Jobs

MPI and OpenMP can be used at the same time to create a Hybrid MPI/OpenMP program.

Let’s look at an example Hybrid MPI/OpenMP hello world program and explain the steps needed to compile and submit it to the queue. An example hybrid MPI hello world program: hello-hybrid.c

#include <stdio.h>
#include "mpi.h"
#include <omp.h>

int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int iam = 0, np = 1;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  #pragma omp parallel default(shared) private(iam, np)
  {
    np = omp_get_num_threads();
    iam = omp_get_thread_num();
    printf("Hello from thread %d out of %d from process %d out of %d on %s\n",
           iam, np, rank, numprocs, processor_name);
  }

  MPI_Finalize();
}

Place hello-hybrid.c in your home directory and compile this program interactively by entering the following commands into a terminal on either Midway1 or Midway2 login nodes:

module load openmpi
mpicc -fopenmp hello-hybrid.c -o hello-hybrid

If you choose hello-hybrid_midway1.sbatch to submit your job to Midway1, you have to run the above commands on one of M idway1 login nodes. Alternatively, you have to run the above commands on one of Midway2 login nodes if you choose hello-hybrid_midway2.sbatch to submit your job to Midway2.

The reason that we can run the same commands on both Midway1 and Midway2 login nodes is that we are using the default version of the OpenMPI module which defaults to the system GCC compiler. Please note that the default version of a module on Midway1 and Midway2 could be different. For example, the default version of the OpenMPI module on Midway1 is 1.6 whereas the default version of the OpenMPI module on Midway2 is 2.0.1. It should be possible to use any available MPI compiler to compile and run this example. An additional option -fopenmp must be given to compile a program with OpenMP directives (-openmp for the Intel compiler and -mp for the PGI compiler).

hello-hybrid_midway1.sbatch is a submission script that can be used to submit a job to Midway1 to run the hello-hybrid program.

#!/bin/bash
# a sample job submission script to submit a hybrid MPI/OpenMP job to the sandyb 
# partition on Midway1 please change the --partition option if you want to use 
# another partition on Midway1

# set the job name to hello-hybrid
#SBATCH --job-name=hello-hybrid

# send output to hello-hybrid.out
#SBATCH --output=hello-hybrid.out

# this job requests 4 MPI processes
#SBATCH --ntasks=4


# and request 8 cpus per task for OpenMP threads
#SBATCH --cpus-per-task=8

# this job will run in the sandyb partition on Midway1
#SBATCH --partition=sandyb

# load the openmpi default module
module load openmpi

# set OMP_NUM_THREADS to the number of --cpus-per-task we asked for
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Run the process with mpirun. Notice -n is not required. mpirun will
# automatically figure out how many processes to run from the slurm options
mpirun ./hello-hybrid

hello-hybrid_midway2.sbatch is a submission script that can be used to submit a job to Midway2 to run the hello-hybrid program.

#!/bin/bash

# a sample job submission script to submit a hybrid MPI/OpenMP job to Midway2

# set the job name to hello-hybrid
#SBATCH --job-name=hello-hybrid

# send output to hello-hybrid.out
#SBATCH --output=hello-hybrid.out

# this job requests 4 MPI processes
#SBATCH --ntasks=4


# and request 8 cpus per task for OpenMP threads. On Midway2, you could ask 
# for up to 28 cpus per task.
#SBATCH --cpus-per-task=8

# this job will run on Midway2
#SBATCH --partition=broadwl

# this job will run on nodes connected with the EDR interconnec. comment/delete the
# following line if the type of interconnect is not important to you
#SBATH --constraint=edr

# load the openmpi default module
module load openmpi

# set OMP_NUM_THREADS to the number of --cpus-per-task we asked for
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Run the process with mpirun. Notice -n is not required. mpirun will
# automatically figure out how many processes to run from the slurm options
mpirun ./hello-hybrid

Note: Midway1 and Midway2 have different set of modules. Please make sure you use the correct module name and version when submitting your job to each cluster.

The options are similar to running an MPI job, but with notable additions:

  • --ntasks=4 specifies the number of MPI processes
  • --cpus-per-task=8 is given to allocate 8 cpus for each task. This number cannot be greater than the number of cores per each node.
  • export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK will set the number of OpenMP threads to the number of requested cores in --cpus-per-task

You can submit hello-hybrid_midway1.sbatch using the following command from one of Midway1 login nodes to Midway1:

sbatch hello-hybrid_midway1.sbatch

Alternatively, you can submit hello-hybrid_midway2.sbatch using the following command from one of Midway2 login nodes to Midway2:

sbatch hello-mpi_midway2.sbatch

Here is an example output of this program submitted to the broadwl partition on Midway2:

Hello from thread 0 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 2 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 3 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 6 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 7 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 4 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 5 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 1 out of 8 from process 1 out of 4 on midway2-0087.rcc.local
Hello from thread 0 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 6 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 1 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 4 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 7 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 5 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 3 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 2 out of 8 from process 3 out of 4 on midway2-0088.rcc.local
Hello from thread 0 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 2 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 5 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 1 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 4 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 6 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 3 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 7 out of 8 from process 0 out of 4 on midway2-0087.rcc.local
Hello from thread 0 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 6 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 1 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 5 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 3 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 2 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 4 out of 8 from process 2 out of 4 on midway2-0087.rcc.local
Hello from thread 7 out of 8 from process 2 out of 4 on midway2-0087.rcc.local