.. _intro-to-RCC-MENG34200: .. index:: Introduction to RCC for Materials Engineering 34200 ################################## Introduction to RCC for MENG 34200 ################################## With your enrollment in the MENG34200 course you have been given access to an education allocation on the Research Computing Center (RCC) Midway compute cluster. Some useful information pertaining to the Midway compute environment is listed below. Where to go for help ==================== For technical questions (help logging in, etc) send a help request to help@rcc.uchicago.edu The User Guide is available at http://docs.rcc.uchicago.edu Logging into Midway ============================== Access to RCC is provided via secure shell (SSH) login. All users must have a UChicago CNetID to log in to any RCC systems. Your RCC account credentials are your CNetID and password: +----------+---------------------------+ | Username | CNetID | +----------+---------------------------+ | Password | CNetID password | +----------+---------------------------+ | Hostname | midway.rcc.uchicago.edu | +----------+---------------------------+ .. note:: RCC does not store your CNet password and we are unable to reset your password. If you require password assistance, please contact UChicago IT Services. Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command :command:`ssh` in a terminal. To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter: .. code:: bash ssh @midway.rcc.uchicago.edu Windows users will first need to download an SSH client, such as PuTTY_, which will allow you to interact with the remote Unix server. Use the hostname :file:`midway.rcc.uchicago.edu` and your CNetID username and password to access Midway through PuTTY. Accessing Software on Midway ============================ When you first log into Midway, you will be entered into a very barebones user environment with minimal software available. The ``module`` system is a script based system used to configure the user environment to make available to the user selected software packages. To access software that is installed on Midway, you use the ``module`` system to load the corresponding software module into your environment. Basic ``module`` commands: +----------------------+-----------------------------------------------------+ | Command | Description | +======================+=====================================================+ | module avail | lists all available software modules | +----------------------+-----------------------------------------------------+ | module avail [name] | lists modules matching [name] | +----------------------+-----------------------------------------------------+ | module load [name] | loads the named module | +----------------------+-----------------------------------------------------+ | module unload [name] | unloads the named module | +----------------------+-----------------------------------------------------+ | module list | lists the modules currently loaded for the user | +----------------------+-----------------------------------------------------+ Examples -------- Obtain a list of the currently loaded modules: .. code:: bash $ module list Currently Loaded Modulefiles: 1) vim/7.4 3) emacs/24 5) env/rcc 2) subversion/1.8 4) git/2.7 6) slurm/current Obtain a list of all available modules: .. code:: bash $ module avail -------------------------------- /software/modulefiles --------------------------------- Minuit2/5.34(default) lammps/cuda-9Dec2014+intelmpi-5.0+intel-15.0 Minuit2/5.34+intel-12.1 lammps/trunk(default) Minuit2/5.34+intel-15.0 leveldb/1(default) ... knitro/9.1.0-z yt/3.0(default) lammps/10Aug2015+intelmpi-5.0+intel-15.0 yt/3.1 lammps/9Dec2014+intelmpi-5.0+intel-15.0 zlib/1.2(default) -------------------------------- /etc/modulefiles -------------------------------------- condor/7.8(default) module-info slurm/2.4 lurm/current(default) env/rcc samba/3.6 slurm/2.5 use.own -------------------------------- Aliases ----------------------------------------------- -------------------------------- Versions ---------------------------------------------- Load the default python version: .. code:: bash $ module load python $ python --version Python 2.7.9 List the currently loaded modules: .. code:: bash $ module list Currently Loaded Modulefiles: 1) vim/7.4 5) env/rcc 9) netcdf/4.2 13) postgresql/9.2 17) slurm/current 2) subversion/1.8 6) mkl/11.2 10) graphviz/2.28 14) proj/4.8 3) emacs/24 7) texlive/2012 11) qt/4.8 15) gdal/1.11 4) git/2.7 8) hdf5/1.8 12) geos/3.4 16) python/2.7-2015q2 Unload the python module .. code:: bash $ module unload python The Midway Cluster Environment ============================== Midway is a linux cluster with approximately 10,000 CPU cores and 1.5PB of storage. Midway is a shared resource used by the entire University community. Sharing computational resources creates unique challenges: * Jobs must be scheduled in a fair manner. * Resource consumption needs to be accounted. * Access needs to be controlled. Thus, a **scheduler** is used to manage job submissions to the cluster. RCC uses the Slurm_ resource manager to schedule jobs and provide interactive access to compute nodes. When you first log into Midway you will be connected to a login node (midway-login1 or midway-login2). Login nodes are not intended to be used for computationally intensive work. Instead, login nodes should be used for managing files, submitting jobs, etc. If you are going to be running a computationally intensive program, you must do this work on a compute node by either obtaining an interactive session or submitting a job through the scheduler. However, you are free to run very short, non-computationally intensive jobs on the login nodes as is often necessary when you are working on and debugging your code. If you are unsure if you job will be computationally intensive (large memory or CPU usage, long running time, etc), get a session on a compute node and work there. There are two ways to send your work to a Midway compute node: #. ``sinteractive`` - Request access to a compute node and log into it #. ``sbatch`` - Write a script which defines commands that need to be executed and let SLURM run them on your behalf (This is generally what will be done for this course). Working interactively on a compute node ======================================= To request an interactive session on a compute node use the :command:`sinteractive` command: .. code:: bash sinteractive When this command is executed, you will be connected to one of Midway's compute nodes where you can then go about running your programs. The default disposition of the :command:`sinteractive` command is to provide you access for 2 hours to a compute node with 1 CPU and 2GB of memory. The :command:`sinteractive` command provides many more options for configuring your session. For example, if you want to get access to a compute node with 1 CPU and 4GB of memory for 3 hours, use the command: .. code:: bash sinteractive --account=meng34200 --cpus-per-task=1 --mem-per-cpu=4096 --time=03:00:00 It may take up to 60 seconds or more for your interactive session to be initialized (assuming there is an available resource that meets your specified requirements). Submitting a job to the scheduler ================================= An alternative to working interactively with a compute node is to submit the work you want carried out to the scheduler through an **sbatch** script. An example sbatch script is shown below: .. code:: bash #!/bin/bash #SBATCH --job-name=h2o #SBATCH --output=h2o-%j.out #SBATCH --error=h2o-%j.err #SBATCH --time=0:10:00 #SBATCH --account=meng34200 #SBATCH --partition=sandyb #SBATCH --ntasks-per-node=16 #SBATCH --nodes=1 # set your file prefix file=h2o input="$file.i" output="$file.r" # load your modules here module load qbox # execute your tasks here mpirun -n 16 qb < $input > $output SBATCH scripts contain two major elements. After the **#!/bin/bash** line, a series of **#SBATCH** parameters are defined. These are read by the scheduler, SLURM, and relay information about what specific hardware is required to execute the job, how long that hardware is required, and where the output and error (stdout and stderr streams) should be written to. If resources are available the job may start less than one second following submission. When the queue is busy and the resource request is substantial the job may be placed in line with other jobs awaiting execution. The %j wildcard included in the output and error file names will cause Slurm to append a unique number to the end of each file. This will prevent your output and error files from being over written if this script is run multiple times in the same directory. The second major element of an sbatch script is the user defined commands. When the resource request is granted the script is executed just as if it were run interactively (i.e. if you had typed in the commands one after the next at the command line). Sbatch scripts execute in the directory from which they were submitted. In the above example, we are assuming that this script is located in the same directory where myScript.py is located. Interact With Your Submitted Jobs ================================= Submitted jobs status is viewable and alterable by several means. The primary slurm command **squeue** allows the user to monitor jobs. For example if one simply runs squeue without any options, a list of all pending jobs on Midway will be shown followed by all running jobs. .. code:: bash squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 18572274 sandyb west-te mgovoni PD 0:00 48 (Resources) 18570549 gagalli water rozsa PD 0:00 1 (Resources) 18570573 ivyb run jhskone PD 0:00 1 (Priority) 18571969 sandyb lmp- mpulighe PD 0:00 2 (Resources) 18572076 sandyb qb_test agaiduk PD 0:00 2 (Priority) 18571537 sandyb west-te- gagalli PD 0:00 8 (Resources) 18572094 sandyb meng3420 rlmcavoy PD 0:00 4 (Priority) ... ... 18563596 sandyb conv_L11 semenov R 1-07:46:58 4 midway[305,469-471] 18563595 sandyb conv_L10 semenov R 1-07:49:00 4 midway[384,407,472-473] 18563594 sandyb conv_L9_ semenov R 1-07:52:00 4 midway[350,367,369,468] 18568833 sandyb est_03_3 ericjsle R 9:03:46 14 midway[106,216,318,326-327,382-383,413,426,428-430,447,462] 18568683 sandyb test mgovoni R 9:35:27 16 midway[199,212,252,352,365,370,403,417,423-424,436,442,444,449,463,508] 18561060 gagalli wv_si_b mahe R 10:40:09 16 midway[680-695] 18554967 gagalli par2.sh rozsa R 6:32:35 4 midway[710-713] The above tells us: +------------------+-------------------------------------------------------------------+ | Name | Description | +==================+===================================================================+ | JOBID | Job ID #, unique reference number for each job | +------------------+-------------------------------------------------------------------+ | PARTITION | Type of node job is running/will run on | +------------------+-------------------------------------------------------------------+ | NAME | Name for the job, defaults to slurm-JobID | +------------------+-------------------------------------------------------------------+ | USER | User who submitted job | +------------------+-------------------------------------------------------------------+ | ST | State of the job | +------------------+-------------------------------------------------------------------+ | TIME | Time used by the job in D-HH:MM:SS | +------------------+-------------------------------------------------------------------+ | NODES | Number of Nodes consumed | +------------------+-------------------------------------------------------------------+ | NODELIST(REASON) | List of Nodes consumed, or reason the job has not started running | +------------------+-------------------------------------------------------------------+ As there are usually a very large number of jobs in the queue, the output of **squeue** must often be filtered to show you only specific jobs that are of interest to you. To view only the jobs that you have submitted use the command: .. code:: bash squeue -u $USER If one wants to have further control over what is displayed in the output fields, use the **-O** flag. For example: .. code:: bash squeue -u $USER -O jobid,partition,state,name,reason,prioritylong JOBID PARTITION STATE NAME REASON PRIORITY 18572274 sandyb PD west-te (Resources) 143035 18570573 ivyb RUNNING test None 100119 To cancel a job that you have submitted, first obtain the job's JobID number by using the ``squeue -u $USER`` command. Then issue the command: .. code:: bash scancel or you can cancel ALL of your jobs at the same time (be sure you really want to do this!) with the command: .. code:: bash scancel -u Accessing and Transferring Files ================================ RCC provides a number of methods for transferring data in/out of Midway. For relatively small amounts of data, we recommend the :command:`scp` command. For non-trivial file transfers, we recommend using `Globus Online`_ for fast, secure and reliable transfers. When working on the UChicago network it is also possible to mount the Midway file systems using **Samba**. Command Line - SCP ------------------ Most UNIX-like operating systems (Mac OS X, Linux, etc) provide a :command:`scp` command which can be accessed from the command line. To transfer files from your local computer to your home directory on Midway, open a terminal window and issue the command:: Single files: $ scp file1 ... midway.rcc.uchicago.edu: Directories: $ scp -r dir1 ... midway.rcc.uchicago.edu: When prompted, enter your CNet password. Windows users will need to download an SCP client such as WinSCP that provides a GUI interface for transferring files via scp. Windows GUI - WinSCP -------------------- WinSCP is a scp client software that can be used to move files to and from Midway and a Windows machine. WinSCP can be obtained from http://www.winscp.net. Use the hostname **midway.rcc.uchicago.edu** and your CNet credentials when connecting. .. image:: winscp-login.png :width: 40 % If prompted to accept the server's host key, select "yes." The main WinSCP window allows you to move files from your local machine (left side) to Midway (right side). .. image:: winscp-main.png :width: 40 % Mac GUI - SFTP Clients ---------------------- There are a number of graphical SFTP clients available for Mac. FileZilla for example is a freely available SFTP client (https://filezilla-project.org/). Use the hostname **midway.rcc.uchicago.edu** and your CNet credentials when connecting. Samba ----- Samba allows uses to connect to (or "mount") their home directory on their local computer so that the file system on Midway appears as if it were directly connected to the local machine. This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the `UChicago virtual private network`_. .. _UChicago virtual private network: https://cvpn.uchicago.edu/ Your Samba account credentials are your CNetID and password:: Username: ADLOCAL\ Password: CNet password Hostname: midwaysmb.rcc.uchicago.edu .. note:: Make sure to prefix your username with **ADLOCAL**\\ On a Windows computer, use the "Map Network Drive" functionality and the following UNC paths:: Home: \\midwaysmb.rcc.uchicago.edu\homes Project: \\midwaysmb.rcc.uchicago.edu\project On a Mac OS X, use these URLs to connect:: Home: smb://midwaysmb.rcc.uchicago.edu/homes Project: smb://midwaysmb.rcc.uchicago.edu/project To connect on a Mac OS X computer: * Use the **Connect to Server** utility in Finder .. image:: finder-connect_to_server.jpg :width: 40 % * Enter one of the URLs from above in the input box for **Server Address**. * When prompted for a username and password, select **Registered User**. * Enter :samp:`ADLOCAL\\{YourCNetID}` for the username and enter your CNet password.