KICP

The KICP has exclusive access to a number of compute nodes associated with the RCC Midway cluster. KICP users access those nodes through the same login nodes and interfaces as the primary cluster, making it trivial to move computational work between the two sets of resources. Most of the documentation available at base_index is applicable to KICP members and the KICP nodes, however there are some specific differences which are described here.

Email sent to kicp@rcc.uchicago.edu will be assigned a trouble ticket and reviewed by Igor Yakushin and the RCC helpdesk. Please don’t hesitate to ask questions if you encounter any issues, or have any requests for software installation. The RCC helpdesk can be also reached by phone at 773-795-2667 during normal business hours.

Get an account

Please complete the RCC User Account Request form to request an account and put KICP as the PI (note: this request will be reviewed for KICP membership). Please clearly state your connection to the KICP, particularly if you are not a local or senior member. If you are requesting access for someone not at the University of Chicago (i.e. someone who doesn’t have a cnetid), please contact Igor Yakushin directly.

To access the rest of the Midway cluster you will need to be added to a different account than KICP, typically provided by a faculty member acting as PI. All KICP faculty are eligible to act as PI for themselves and others, and many already have PI accounts on the Midway cluster.

Submit A Job

As a shared resource, Midway uses a batch queueing system to allocate nodes to individuals and their jobs. Midway uses the Slurm batch queuing system, which is similar to the possibly more familiar PBS batch system.

Please see Using Midway and Running Jobs on Midway for information on using Midway to perform computational tasks, typically by submitting batch jobs. The Slurm commands, reiterated below, can be used as described in that documentation, however KICP users may need to point to one of the two KICP partitions, kicp and kicp-ht, and select the kicp account.

Note

Specifying –account=kicp and –partition=kicp is generally optional for users who belong to the KICP group and no other, however specifying them is generally good practice.

Useful Commands Description
sbatch -p kicp -a kicp Submit a job to the Slurm scheduling system.
sinteractive -p kicp Run an interactive job on a KICP compute node
squeue -p kicp List the submitted and running jobs in the KICP partition.
squeue -u $USER List the current users’ own submitted and running jobs.
sinfo -p kicp List the number of available and allocated KICP nodes.
scancel job_id Cancel the job identified by the given job_id (e.g. 3636950).
scancel -u $USER Cancel all jobs submitted by the current user

There are many ways to submit a batch job, depending on what that job requires (number of processors, number of nodes, etc). Slurm will automatically start your job in the directory from which it was submitted. To submit a job, create a batch script, say my_job.sh and submit with the command sbatch my_job.sh. The following is a list of commonly used sbatch commands. A more complete list can be found in the sbatch man page.

The following is a good example batch script:

#!/bin/bash

#SBATCH --job-name=my_job
#SBATCH --output=my_job_%j.out
#SBATCH --time=24:00:00
#SBATCH --partition=kicp
#SBATCH --account=kicp
#SBATCH --nodes=1
#SBATCH --exclusive

echo $SLURM_JOB_ID starting execution `date` on `hostname`

# load required modules (change to your requirements!)
# example: module load openmpi/1.8

# uncomment below if your code uses OpenMP to properly set the number of threads
# export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# the commands to run your job
# example: mpirun ./my_task
# Note: slurm will automatically tell MPI how many tasks and tasks per node to use

KICP Queues

KICP has the access to the following queues (partition in Slurm terminology):

Partition Wallclock limit Job limits
kicp 48h 256 cores, 64 jobs per user
kicp-ht 36h 64 cores/job, 32 jobs per user
kicp-long 100h 128 cores/queue, 64 cores/user

access to kicp-long requires special authorization.

If you are running jobs with significant I/O or communication between nodes (typically MPI jobs), then you should use the the tightly-coupled, infiniband nodes accessed through the kicp and kicp-long partitions. Purely serial or embarrassingly parallel jobs with a large calculation to I/O ratio (say MCMC likelihood sampling) should use the high-throughput nodes in the kicp-ht queue. The limits for kicp-ht were relaxed to encourage use. If users start to conflict, they may be restricted to prevent a single user from dominating those nodes.

Midway also includes two large memory (256GB) and four GPU enabled nodes, as well as a significantly larger set of nodes that are shared with the rest of the University. Accessing these resources requires a separate allocation. Please contact Igor Yakushin for more details.

Storage

You will have access to three different storage locations on Midway. Your home directory has a 25G quota, and should be used for small files and codes.

KICP has a 50TB allocation in /project/kicp/, and each user is initially given a 1TB quota and their own subdirectory (/project/kicp/$USER). If you require more space, please let Igor Yakushin know and your quota may be increased on a case-by-case basis. Both home and project space are backed up hourly to disk, and daily to tape.

Finally, there is a high-performance filesystem mounted on /scratch which should be used during runs and has a 5TB quota. A symlink to this directory is placed in your home at $HOME/midway-scratch. This directory is not backed up and should not be used for long-term storage. In future, files older than a to be determined age may be removed automatically, so please practice good data management.

Snapshots and Backups

We all inadvertently delete or overwrite files from time to time. Snapshots are automated backups that are accessible through a separate path. Snapshots of a user’s home directory can be found in /snapshots/*/home/cnetid/ where the subdirectories refer to the frequency and time of the backup, e.g. daily-2012-10-04.06h15 or hourly-2012-10-09.11h00.

Software

Many common astrophysical codes and libraries have been installed or built on Midway. See the complete Software Module List and the astrophysics category. Other common astrophysics packages require configuration at compilation that prevent them from being installed system-side. Look here for program specific compilation flags, installation instructions, etc for these packages. Please contact Igor Yakushin if you notice any problems with any of the software or instructions.

IDL

IDL is installed on Midway, however the RCC is unable to provide licenses to the entire community. Users who have their own licenses or license servers may configure them to be able to use IDL on Midway’s login and compute nodes. Contact RCC for more details.

CosmoMC

CosmoMC is a Markov-Chain Monte-Carlo (MCMC) code which is integrated with the theoretical power spectrum code CAMB. Since it is often modified by users, we don’t install a system-wide version, however we have verified that the following parameters give good performance (Warning: under construction, don’t use these instructions without first talking to Igor Yakushin).

Specific installation instructions for a non-mpi build using the intel compiler and the intel math kernel library mkl 10.3 (note, these differ slightly from those provided by the CosmoMC readme ):

  • Download CosmoMC from cosmologist.info and untar on Midway
  • Load the modules appropriate for the compiler you intend to use. In this case a non-mpi build with the intel compiler: module load intel/12.1 cfitsio/3+intel-12.1 mkl/10.3
  • Edit the CosmoMC Makefile located in the source subdirectory

The WMAP7 likelihood code and data are already configured and installed on Midway in the /project/kicp/opt/WMAP/ directory. The stock v4 and v4p1 versions from Lambda are installed, as is a patched version of v4 with special optimizations from Cora Dvorkin & Wayne Hu. Select the version you wish to use and change the WMAP variable to point to the full directory, e.g. WMAP = /project/kicp/opt/WMAP/likelihood_v4p1.

  • Modify the compiler and optimization options
F90C = ifort
FFLAGS = -O2 -openmp -fpp
LAPACKL = $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a \
              -Wl,--start-group \
              $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a \
              $(MKLROOT)/lib/intel64/libmkl_sequential.a \
              $(MKLROOT)/lib/intel64/libmkl_core.a \
              -Wl,--end-group -lpthread
  • Run make

Note, this section is under construction. Updated build instructions for a variety of compilers and mpi support will come soon.

ART

ART has default compilation flags for Midway. Set the environment variable PLATFORM to midway. This platform file will automatically detect the MPI environment and compiler option you are using and configure the code accordinly.

RAMSES

Ramses is a cosmological hydrodynamic adaptive mesh refinement code originally written by Romain Teyssier. It is public, and can be downloaded . Oscar Agertz has compiled and run Ramses on Midway and reports good performance with the following makefile configuration:

F90 = mpif90 -O3
FFLAGS = -cpp -DNVAR=$(NVAR) -DNDIM=$(NDIM) -DNPRE=$(NPRE) \
            -DSOLVER$(SOLVER) -DNOSYSTEM -DNVECTOR=$(NVECTOR)
LIBMPI =
LIBS = $(LIBMPI)

Gadget

This refers to the public version of Gadget 2.0.7. The following Makefile configuration should work under all combination of compilers, MPI libraries, and Gadget options (including HDF5 support):

CC = mpicc
OPTIMIZE = -O3
MPICHLIB =
HDF5INCL = -DH5_USE_16_API
HDF5LIB = -lhdf5

The code requires that modules are loaded for fftw2, gsl, mpi, and an optional hdf5. Make sure to load compiler and MPI library specific versions of the modules as necessary. Some examples are given below:

  • Intel compiler + Intel MPI (note, loading intelmpi will automatically load intel/12.1):
module load intelmpi/4.0+intel-12.1 fftw2/2.1.5+intelmpi-4.0+intel-12.1 hdf5/1.8 gsl/1.15
  • Intel compiler + OpenMPI:
module load openmpi/1.6+intel-12.1 fftw2/2.1.5+openmpi-1.6+intel-12.1 hdf5/1.8 gsl/1.15
  • GCC + OpenMPI:
module load openmpi/1.6 fftw2/2.1.5+openmpi-1.6 hdf5/1.8 gsl/1.15