.. _intro-to-RCC-CHEM268:

.. index::
  Introduction to RCC for CHEM 268

################################
Introduction to RCC for CHEM 268
################################

As part of CHEM 268 you have been given access to the University of Chicago Research Computing Center (RCC) Midway compute cluster.  Below are the basics you will need to know in order to connect to and use the cluster.

Where to go for help
====================

For technical questions (help logging in, etc) send a help request to help@rcc.uchicago.edu

Technical documentation is available at http://docs.rcc.uchicago.edu

RCC's walk-in lab is open during business hours and is located in Regenstein Library room 216.  Feel free to drop by to chat with one of our staff members if you get stuck.


Logging into Midway
==============================

Access to RCC is provided via secure shell (SSH) login.

All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password:

    +----------+---------------------------+
    | Username | CNetID                    |
    +----------+---------------------------+
    | Password | CNetID password           |
    +----------+---------------------------+
    | Hostname | midway.rcc.uchicago.edu   |
    +----------+---------------------------+

.. note::

    RCC does not store your CNet password and we are unable to reset your password.
    If you require password assistance, please contact UChicago IT Services.

Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command :command:`ssh` in a terminal.  To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter:

.. code:: bash

    ssh <username>@midway.rcc.uchicago.edu

Windows users will first need to download an SSH client, such as PuTTY_, which will allow you to interact with the remote Unix server. Use the hostname :file:`midway.rcc.uchicago.edu` and your CNetID username and password to access Midway through PuTTY.


Accessing Software on Midway
============================

When you first log into Midway, you will be entered into a very barebones user environment with minimal software available.  

Why isn't everything installed and available by default? The need for

* multiple versions of software (version number, add-ons, custom software) and
* multiple build configurations (compiler choice, options, and MPI library)

would lead to hopelessly polluted namespace and PATH problems. Additionally, most of the applications used on HPC machines are research codes in a constant state of development. Testing, stability, compatibility, and other usability concerns are often not a primary consideration of the authors.

The ``module`` system is a script based system used to manage the user environment and to "activate" software packages.  In order to access software that is installed on Midway, you must first load the corresponding software module.

Basic ``module`` commands:

  +----------------------+-----------------------------------------------------+
  | Command              | Description                                         |
  +======================+=====================================================+
  | module avail         | lists all available software modules                |
  +----------------------+-----------------------------------------------------+
  | module avail [name]  | lists modules matching [name]                       |
  +----------------------+-----------------------------------------------------+
  | module load [name]   | loads the named module                              |
  +----------------------+-----------------------------------------------------+
  | module unload [name] | unloads the named module                            |
  +----------------------+-----------------------------------------------------+
  | module list          | lists the modules currently loaded for the user     |
  +----------------------+-----------------------------------------------------+

Examples
--------

Obtain a list of the currently loaded modules:

.. code:: bash

    $ module list
    
    Currently Loaded Modulefiles:
    1) slurm/2.4       3) subversion/1.6  5) env/rcc         7) tree/1.6.0
    2) vim/7.3         4) emacs/23.4      6) git/1.7


Obtain a list of all available modules:

.. code:: bash

  $ module avail

  -------------------------------- /software/modulefiles ---------------------------------
  Minuit2/5.28(default)                               intelmpi/4.0
  Minuit2/5.28+intel-12.1                             intelmpi/4.0+intel-12.1(default)
  R/2.15(default)                                     jasper/1.900(default)
  ...
  ifrit/3.4(default)                                  x264/stable(default)
  intel/11.1                                          yasm/1.2(default)
  intel/12.1(default)
  ------------------------- /usr/share/Modules/modulefiles -------------------------------
  dot         module-cvs  module-info modules     null        use.own     
  ----------------------------------- /etc/modulefiles -----------------------------------
  env/rcc            samba/3.6          slurm/2.3          slurm/2.4(default) 
  --------------------------------------- Aliases ----------------------------------------
  

Obtain a list of available versions of a particular piece of software:

.. code:: bash

  $ module avail python
  
  ---------------------------- /software/modulefiles -----------------------------
  python/2.7(default) python/2.7-2013q4   python/2.7-2014q1   python/3.3          
  ------------------------------- /etc/modulefiles -------------------------------

Load the default python version:

.. code:: bash

  $ module load python

  $ python --version
  Python 2.7.3


List the currently loaded modules:

.. code:: bash

  $ module list

  Currently Loaded Modulefiles:
   1) vim/7.4         4) env/rcc         7) netcdf/4.2     10) python/2.7     
   2) subversion/1.8  5) mkl/10.3        8) texlive/2012   11) slurm/current  
   3) emacs/24        6) hdf5/1.8        9) graphviz/2.28  


Unload the python/2.7 module and load python/2.7-2014q1

.. code:: bash
  
  $ module unload python/2.7

  $ module load python/2.7-2014q1

  $ python --version
  Python 2.7.6


The Midway Cluster Environment
==============================

Midway is a linux cluster with approximately 10,000 CPU cores and 1.5PB of storage.  Midway is a shared resource used by the entire University community. Sharing computational resources creates unique challenges:

* Jobs must be scheduled in a fair manner.
* Resource consumption needs to be accounted.
* Access needs to be controlled.

Thus, a **scheduler** is used to manage job submissions to the cluster.  RCC uses the Slurm_ resource manager to schedule jobs and provide interactive access to compute nodes.

When you first log into Midway you will be connected to a login node (midway-login1 or midway-login2).  Login nodes are not intended to be used for computationally intensive work.  Instead, login nodes should be used for managing files, submitting jobs, etc.  If you are going to be running a computationally intensive program, you must do this work on a compute node by either obtaining an interactive session or submitting a job through the scheduler.  However, you are free to run very short, non-computationally intensive jobs on the login nodes as is often necessary when you are working on and debugging your code.  If you are unsure if you job will be computationally intensive (large memory or CPU usage, long running time, etc), get a session on a compute node and work there.

There are two ways to send your work to a Midway compute node:

#. ``sinteractive`` - Request access to a compute node and log into it
#. ``sbatch`` - Write a script which defines commands that need to be executed and let SLURM run them on your behalf


Working interactively on a compute node
=======================================

To request an interactive session on a compute node use the :command:`sinteractive` command:

.. code:: bash

    sinteractive

When this command is executed, you will be connected to one of Midway's compute nodes where you can then go about running your programs.  The default disposition of the :command:`sinteractive` command is to provide you access for 2 hours to a compute node with 1 CPU and 2GB of memory.  The :command:`sinteractive` command provides many more options for configuring your session. For example, if you want to get access to a compute node with 1 CPU and 4GB of memory for 5 hours, use the command:

.. code:: bash

    sinteractive --cpus-per-task=1 --mem-per-cpu=4096 --time=05:00:00

It may take up to 60 seconds for your interactive session to be initialized (assuming there is an available compute node that meets your specified requirements).


Submitting a job to the scheduler
=================================

An alternative to working interactively with a compute node is to submit the work you want carried out to the scheduler through an **sbatch** script.  An example sbatch script is shown below:

.. code:: bash

  #!/bin/bash
  #SBATCH --job-name=example
  #SBATCH --output=example-%j.out
  #SBATCH --error=example-%j.err
  #SBATCH --ntasks=1
  #SBATCH --cpus-per-task=1
  #SBATCH --mem-per-cpu=4096

  # load your modules here
  module load python/2.7-2014q1

  # execute your tasks here
  python myScript.py


SBATCH scripts contain two major elements.  After the **#!/bin/bash** line, a series of **#SBATCH** parameters are defined.  These are read by the scheduler, SLURM, and relay information about what specific hardware is required to execute the job, how long that hardware is required, and where the output and error (stdout and stderr streams) should be written to. If resources are available the job may start less than one second following submission. When the queue is busy and the resource request is substantial the job may be placed in line with other jobs awaiting execution.

The %j wildcard included in the output and error file names will cause Slurm to append a unique number to the end of each file.  This will prevent your output and error files from being over written if this script is run multiple times in the same directory.

The second major element of an sbatch script is the user defined commands. When the resource request is granted the script is executed just as if it were run interactively (i.e. if you had typed in the commands one after the next at the command line).

Sbatch scripts execute in the directory from which they were submitted.  In the above example, we are assuming that this script is located in the same directory where myScript.py is located.


Interact With Your Submitted Jobs
=================================

Submitted jobs status is viewable and alterable by several means.  The primary command **squeue** is part of a versatile system of job monitoring.

Example:

.. code:: bash

  squeue 

  JOBID  PARTITION      NAME     USER  ST       TIME  NODES NODELIST(REASON)
  3518933   sandyb  polyA6.0   ccchiu  PD       0:00      1 (QOSResourceLimit)
  3519981   sandyb     R_AFM mcgovern  PD       0:00      1 (Resources)
  3519987   sandyb     R_AFM mcgovern  PD       0:00      1 (Priority)
  3519988   sandyb     R_AFM mcgovern  PD       0:00      1 (Priority)
  ...
  ...
  3539126       gpu _interac  jcarmas   R      45:52      1 midway231
  3538957       gpu test.6.3 jwhitmer   R      58:52      1 midway230
  3525370  westmere phase_di   khaira   R    4:50:02      1 midway008
  3525315  westmere phase_di   khaira   R    4:50:03      1 midway004
  3525316  westmere phase_di   khaira   R    4:50:03      1 midway004

The above tells us:

  +------------------+-------------------------------------------------------------------+
  | Name             | Description                                                       |
  +==================+===================================================================+
  | JOBID            | Job ID #, unique reference number for each job                    |
  +------------------+-------------------------------------------------------------------+
  | PARTITION        | Type of node job is running/will run on                           |
  +------------------+-------------------------------------------------------------------+
  | NAME             | Name for the job, defaults to slurm-JobID                         |
  +------------------+-------------------------------------------------------------------+
  | USER             | User who submitted job                                            |
  +------------------+-------------------------------------------------------------------+
  | ST               | State of the job                                                  |
  +------------------+-------------------------------------------------------------------+
  | TIME             | Time used by the job in D-HH:MM:SS                                |
  +------------------+-------------------------------------------------------------------+
  | NODES            | Number of Nodes consumed                                          |
  +------------------+-------------------------------------------------------------------+
  | NODELIST(REASON) | List of Nodes consumed, or reason the job has not started running |
  +------------------+-------------------------------------------------------------------+

As there are usually a very large number of jobs in the queue, the output of **squeue** must often be filtered to show you only specific jobs that are of interest to you.  To view only the jobs that you have submitted use the command:

.. code:: bash

  squeue -u <yourCNetID>


To cancel a job that you have submitted, first obtain the job's JobID number by using the squeue command.  Then issue the command:

.. code:: bash

    scancel <JobID>

or cancel ALL of your jobs at the same time (be sure you really want to do this!) with the command:

.. code:: bash

     scancel -u <yourCNetID>


Accessing and Transferring Files
================================

RCC provides a number of methods for transferring data in/out of Midway.  For
relatively small amounts of data, we recommend the :command:`scp` command.  For
non-trivial file transfers, we recommend using `Globus Online`_ for fast,
secure and reliable transfers. When working on the UChicago network it is also
possible to mount the Midway file systems using **Samba**.

Command Line - SCP
------------------

Most UNIX-like operating systems (Mac OS X, Linux, etc) provide a :command:`scp`
command which can be accessed from the command line.  To transfer files from
your local computer to your home directory on Midway, open a terminal window
and issue the command::

    Single files: $ scp file1 ... <CNetID@>midway.rcc.uchicago.edu:
    Directories:  $ scp -r dir1 ... <CNetID@>midway.rcc.uchicago.edu:

When prompted, enter your CNet password.

Windows users will need to download an SCP client such as WinSCP that provides a GUI interface for transferring files via scp.


Windows GUI - WinSCP
--------------------

WinSCP is a scp client software that can be used to move files to and from Midway and a Windows machine.  WinSCP can be obtained from http://www.winscp.net.

Use the hostname **midway.rcc.uchicago.edu** and your CNet credentials when connecting.

.. image:: winscp-login.png
   :width: 40 %

If prompted to accept the server's host key, select "yes." 

The main WinSCP window allows you to move files from your local machine (left side) to Midway (right side).

.. image:: winscp-main.png
   :width: 40 %


Mac GUI - SFTP Clients
----------------------

There are a number of graphical SFTP clients available for Mac.  FileZilla for example is a freely available SFTP client (https://filezilla-project.org/).

Use the hostname **midway.rcc.uchicago.edu** and your CNet credentials when connecting.

Samba
-----

Samba allows uses to connect to (or "mount") their home directory on their local computer so that the file system on Midway appears as if it were directly connected to the local machine.  This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the `UChicago virtual private network`_.

.. _UChicago virtual private network: https://cvpn.uchicago.edu/

Your Samba account credentials are your CNetID and password::

    Username: ADLOCAL\<CNetID>
    Password: CNet password
    Hostname: midwaysmb.rcc.uchicago.edu

.. note::

    Make sure to prefix your username with **ADLOCAL**\\

On a Windows computer, use the "Map Network Drive" functionality and the following UNC paths::

    Home:    \\midwaysmb.rcc.uchicago.edu\homes
    Project: \\midwaysmb.rcc.uchicago.edu\project

On a Mac OS X, use these URLs to connect::

    Home:    smb://midwaysmb.rcc.uchicago.edu/homes
    Project: smb://midwaysmb.rcc.uchicago.edu/project

To connect on a Mac OS X computer:

* Use the **Connect to Server** utility in Finder

.. image:: finder-connect_to_server.jpg
   :width: 40 %

* Enter one of the URLs from above in the input box for **Server Address**.
* When prompted for a username and password, select **Registered User**.
* Enter :samp:`ADLOCAL\\{YourCNetID}` for the username and enter your CNet password.

Gaussian
========

For the duration of this class, you will have access to a molecular modelling software package called Gaussian. You can load Gaussian with "module load gaussian".

While Gaussian is running, it creates scratch files for intermediate work. These scratch files can get extremely large, to the point that they go over the disk quota for individual users. Midway has scratch space intended precisely for this purpose, to hold intermediate data generated by a computation. Scratch space is high performance, not backed-up, and has a 5 TB quota.

Scratch space is available at :file:`$HOME/scratch-midway`. To use a scratch folder with Gaussian, first create a folder in scratch space. For example:

.. code:: bash

    mkdir -p ~/scratch-midway/gaussian-work

Then, add the following line to your sbatch submission scripts:

.. code:: bash

    export GAUSS_SCRDIR=$HOME/scratch-midway/gaussian-work

This sets and exports the environment variable :command:`GAUSS_SCRDIR`, which Gaussian reads in order to decide where to put its scratch files.