.. index:: single: LAMMPS .. _mdoc_lammps: ======= LAMMPS_ ======= ----------------- Table of Contents ----------------- - `Quick Start`_ - `Intro of LAMMPS`_ - `Get Best Performance`_ - `Benchmarks`_ - `Copyright`_ ----------- Quick Start ----------- If you're familiar with LAMMPS software, this section gives you quick steps of using LAMMPS, which has been installed and optimized on the Midway cluster at RCC. LAMMPS is installed with RCC Module system. You can use either of the following commands to load it into the shell environment:: module load lammps This module is built from the SVN source hosted at svn://svn.icms.temple.edu/lammps-ro/trunk, Version 30Sep14. The SVN trunk provides the up-to-date code from LAMMPS developers. The optimiztion package "OPT" is compiled along with this binary:: module load lammps-plumed This module is built from the TARBALL source hosted at http://lammps.sandia.gov/download.html, Version 5Sep14, which is the latest stable distribution. There are two important features in this installation: (1) two packages USER-OMP and USER-INTEL were added for optimization; (2) The offsite pakcage "USER-PLUMED" was added to provide free energy techniques. After loading either of the two modules, you can run LAMMPS with the binary "lmp_intelmpi". Although the two modules were compiled with different vesions of Intel MPI libraries, the module system can automatically load the correct one. A typical SLURM script of running LAMMPS jobs is following: .. literalinclude:: lammps.sbatch :language: bash GPU support has been also patched in lammps-plumed module. Two packages, GPU and USER-CUDE were compiled with CUDA-4.2. The module is named with suffix with suffix of "-cuda" for you to load or use them:: module load lammps-plumed/5Sep14-cuda+intelmpi-5.0+intel-15.0 mpirun lmp_intelmpi-cuda -c on -sf cuda < in.lj --------------- Intro of LAMMPS --------------- LAMMPS is a simulation software for particle systems. It is specially designed for molecular dynamics technique and large-scalse parallel simulations. It is an open-source code and developed and maintained by Sandia National Liboratory (SNL). It has been widely used for studies of methodology & algorithm developments and simulations of material science, chemistry, physics and biology. For more information of LAMMPS, please visit its offical website: http://http//lammps.sandia.gov/ To gain more advice of using LAMMPS efficiently, please read its discussion section on "Acceleration" at: http://lammps.sandia.gov/doc/Section_accelerate.html -------------------- Get Best Performance -------------------- The module lammps-plumed is installed with following packages: ASPHERE BODY CLASS2 COLLOID DIPOLE FLD GRANULAR MANYBODY KSPACE MC MISC MOLECULE REPLICA RIGID SHOCK SRD USER-CG-CMM USER-EFF USER-FEP USER-LB USER-MISC USER-MOLFILE USER-OMP USER-SPH USER-PLUMED USER-INTEL To know more information about these packages, please read http://lammps.sandia.gov/doc/Section_start.html#start_3 If you need other packages to be installed in this module, please contact yuxing@uchicago.edu The binaries of LAMMPS compiled here are aim to provide RCC users the optimized solutions. Therefore, three important packages are specially discussed here: OPT, USER-OMP and USER-INTEL. **OPT** ============ Quote from LAMMPS website: "*The OPT package was developed by James Fischer (High Performance Technologies), David Richie, and Vincent Natoli (Stone Ridge Technologies). It contains a handful of pair styles whose compute() methods were rewritten in C++ templated form to reduce the overhead due to if tests and other conditional code.*" To use OPT acceleration, you just need to put "-sf opt" in your job command:: Ex: lmp_intelmpi -sf opt < lj.in However, only part of pair-styles are optimized in this package. **USER-OMP** ============ The USER-OMP package was developed by Axel Kohlmeyer at Temple University. The purpose of package is to introduce the OpenMP/MPI hybrid parallel scheme into LAMMPS to gain benefits from the state-of-art multicore processors. Therefore, the parallel jobs are run in the combination of **SMP threads** x **MPI tasks**. For example, if you request the resource of 64 cores (4 nodes) on Sandyb partition through SLURM systems, you can choose different combination (16x4, 8x8, 4x16, et al) to receive the optimal performace. To do this, please set the following options correctly. For example, if I want to run 8 MPI tasks total, and each of them allocate 8 threads, which means one MPI task per Sandy-Bridge processor:: --nnodes=4 // allocate 4 nodes total --ntasks-per-node=2 // execute 2 MPI tasks per node (1 per processor) --cpus-per-task=8 // allocate 8 openmp threads per MPI task You can also set environment variable OMP_NUM_THREADS=8 for this. (Not neccessary) Besides, you need to also turn on the OMP suffix in job command:: Ex: lmp_intelmpi -sf omp < lj.in At the beginning of the LAMMPS in-script (i.e., lj.in in this example), you need to also specify the loading of pacakge by:: package omp $N $N is the number of OMP threads, which equals to 8 in this example. Unfortunately we didn't a boost of speed on this hybrid code. For most of the cases, OMP_NUM_THREADS=1 gives the best performance, which means hybrid isn't actually used. **However, the USER-OMP pacakge did optimize lots of the codes, from different force calculations to integrations, which results a significant acceleration although OMP_NUM_THREADS=1 is used.** **USER-INTEL** ============== This is a very new package that was developed by Intel technicians. The purpose of this pacakages is to implement the MIC support into LAMMPS. However, without having a MIC card, the code can be also accelerated a lot on CPU-only clusters, because the codes were rewritten to support the INTEL AVX vectoring tenchnique. USER-INTEL also provides a large number of optimized codes for LAMMPS functions. To use OPT acceleration, you just need to put "-sf opt" in your job command:: Ex: lmp_intelmpi -sf intel < lj.in At the beginning of the LAMMPS in-script (i.e., lj.in in this example), you need to also specify the loading of pacakge by:: package intel ---------- Benchmarks ---------- Testing infomation:: Hardware: Haswell E5-2660 v3. (2.6GHz, 20 core) DDR4, 2133MHz System: Pure water box (100 x 100 x 100 A^3), 95,577 atoms PPPM: double precision FFTW3, 1E-4 CUTOFF: 10A Steps: 1000 +---------------+---------------+---------------+---------------+---------------+ | Timings | Plain | OPT | USER-OMP | USER-INTEL | +===============+===============+===============+===============+===============+ | Total | 50.07 | 45.20 | 41.41 | 33.75 | +---------------+---------------+---------------+---------------+---------------+ | Pair | 37.55 | 33.11 | 29.07 | 21.54 | +---------------+---------------+---------------+---------------+---------------+ | Kspace | 6.48 | 6.86 | 5.89 | 5.75 | +---------------+---------------+---------------+---------------+---------------+ | Neighbor | 3.18 | 2.29 | 3.08 | 2.95 | +---------------+---------------+---------------+---------------+---------------+ | Communication | 0.90 | 0.85 | 1.02 | 1.01 | +---------------+---------------+---------------+---------------+---------------+ | Output | 0.01 | 0.01 | 0.01 | 0.01 | +---------------+---------------+---------------+---------------+---------------+ | Other | 1.60 | 1.76 | 2.07 | 2.12 | +---------------+---------------+---------------+---------------+---------------+ *Numbers are in seconds. Lower is better.* --------- Copyright --------- .. include:: copyright.txt .. _LAMMPS: http://lammps.sandia.gov/