Stata

Stata is a powerful statistical software package that is widely used in scientific computing. RCC users are licensed to use Stata on all RCC resources. Stata can be used interactively or as a submitted script. Please note that if you would like to run it interactively, you must still run it on a compute node, in order to keep the login nodes free for other users. Stata can be run in parallel on up to 16 nodes.

Note

Stata examples in this document are adapted from a Princeton tutorial. You may find it useful if you are new to Stata or want a refresher.

Getting Started

If you need to use the Stata GUI, connect to Midway with Connecting with ThinLinc.

Obtain an interactive session on a compute node. This is necessary so that your computation doesn’t interrupt other users on the login node. Now, load Stata:

sinteractive
module load stata
xstata

This will open up a Stata window. The middle pane has a text box to enter commands at the bottom, and a box for command results on top. On the left there’s a box called “Review” that shows your command history. The right-hand box contains information about variables in the currently-loaded data set.

One way Stata can be used is as a fancy desktop calculator. Type the following code into the command box:

display 2+2

Stata can do much more if data is loaded into it. The following code loads census data that ships with Stata, prints a description of the data, then creates a graph of life expectancy over GNP:

sysuse lifeexp
describe
graph twoway scatter lexp gnppc

Running Stata from the command line

This is very similar to running graphically; the command-line interface is equivalent to the “Results” pane in the graphical interface. Again, please use a compute node if you are running computationally-intensive calculations:

sinteractive
module load stata
stata

Running Stata Jobs with SLURM

You can also submit Stata jobs to SLURM, the scheduler. A Stata script is called a “do-file,” which contains a list of Stata commands that the interpreter will execute. You can write a do-file in any text editor, or in the Stata GUI’s do-file editor: click “Do-File Editor”” in the “Window” menu. If your do-file is named “example.do,” you can run it with either of the following commands:

stata < example.do
stata -b do example.do

Here is a very simple do-file, which computes a regression on the sample data set from above:

version 13 // current version of Stata, this is optional but recommended.

sysuse lifeexp
gen loggnppc = log(gnppc)
regress lexp loggnppc

Here is a submission script that submits the Stata program to the default queue on Midway:

#!/bin/bash

#SBATCH --job-name=stataEx
#SBATCH --output=stata_example.out
#SBATCH --error=stata_example.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1

module load stata

stata -b stata_example.do

stata_example.do is our example do-file, and stata_example.sbatch is the submission script.

To run this example, download both files to a directory on Midway. Enter the following command to submit the program to the scheduler:

sbatch stata_example.sbatch

Output from this example can be found in the file named stata_example.log, which will be created automatically in your current directory.

Running Parallel Stata Jobs

The parallel version of Stata, Stata/MP, can speed up computations and make effective use of RCC’s resources. When running Stata/MP, you are limited to 16 cores and 5000 variables. Run an interactive Stata/MP session:

sinteractive
module load stata
stata-mp
# or, for the graphical interface:
xstata-mp

Here is a sample do-file that would benefit from parallelization. It runs bootstrap estimation on another data set that ships with Stata.

version 13

sysuse auto
expand 10000
bootstrap: logistic foreign price-gear_ratio

Here is a submission script that will run the above do-file with Stata/MP:

#!/bin/bash

#SBATCH --job-name=stataMP
#SBATCH --output=stata_parallel.out
#SBATCH --error=stata_parallel.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=16

module load stata
stata-mp -b stata_parallel.do

Download stata_parallel.do and stata_parallel.sbatch to Midway, then run the program with:

sbatch stata_parallel.sbatch