Introduction to the Penrose Cluster


 

Description. 3

Cluster Basics. 3

Home Directory. 3

Scratch Space. 3

Applications. 4

Chemistry Applications. 4

Bioinformatics. 4

Visualization. 5

Compilers. 6

Accessing Penrose Cluster and Graphical Desktop Enviroment 7

SGE (Sun Grid Engine) 11

qsub - Submit scripts to queue. 11

qdel – Delete job from queue. 11

qstat - Show job/queue status. 11

qhost - Show job/host status. 11

Job Command Files. 12

Reviewing Job Output 12

Application Specific Notes. 13

GAMESS. 13

GROMACS. 15

NAMD.. 17

Espresso-3.2/PWscf 18

Links to Useful Software and Documentation. 21

Acknowledgements. 22


Description

 

The Penrose cluster is a Linux compute server which appears as one big, multiprocessing, compute server that can run many computational intensive jobs. The system currently consists of a front-end node server and eight (9) compute nodes.  Researchers and educators in the department can take advantage of this computing power by running a range of jobs from shell scripts to larger computational jobs and batch processes.  The Penrose cluster uses ROCKs 4.3 Cluster software which is based on CENTOS.  A number of chemistry, biochemistry and material science related computational packages are installed and configured.

 

Cluster Basics

 

Researchers should log in to the Penrose headnode, penrose.nsm.iup.edu, using an SSH client program (e.g. putty.exe) and their username and password.  The username and password can be obtained by contacting Dr. LeBlond.  You must use SSH2 protocol.  File transfer between the users home directory can be performed with SFTP (e.g. Filezilla).  To run graphical programs on your desktop PC remotely you will also need an X server.  Either Xming or cygwin/X has been successfully used. 

 

Home Directory

Your home directory on the Penrose cluster is /home/$USER. The initial quota limit for your home directory is set at XXXGB.

 

Scratch Space

The recommended, node-local, temporary workspace or scratch space is /state/partition1/$USER, the contents of which is subject to automatic deletion after 30 days.  This will be configured by the system administrator during account setup.  It is your responsibility to clean your scratch space should you have a crashed job.  An out of control job could fill the disk and result in the node crashing.

 


Applications

 

The following computational chemistry, bioinformatics and visualization software packages are available on the Penrose cluster.

Chemistry Applications

 

  • GAMESS - General Atomic and Molecular Electronic Structure System - can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. Correlation corrections to these SCF wavefunctions include Configuration Interaction, second order perturbation theory, and coupled-cluster approaches, as well as the density functional theory approximation.
  • GROMACS - GROningen MAchine for Chemical Simulation - is a software suite meant for molecular dynamics simulation. The version of GROMACS installed is 3.3.1.
  • NAMD - http://www.ks.uiuc.edu/Research/namd/ - is a parallel molecular dynamics code for large biomolecular systems.  The version of NAMD installed is 2.6. (compiled with gnu compiler and linked against OpenMPI compiled with gnu)
  • Espresso-3.2.3 - An integrated suite of computer codes for electronic structure calculations and materials modelling.  It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving and ultrasoft).  Compiled with Intel 10 FORTRAN compiler and linked against MPICH-1.2.7p1 also compiled with Intel 10 compiler)
  • LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator. Compiled with Intel 10 c++ compiler and linked against MPICH-1.2.7p1. If you wish to use this software, you should obtain you own license.
  • SIESTA-2.0.1 - http://www.uam.es/departamentos/ciencias/fismateriac/siesta/ (Spanish Initiative for Electronic Simulations with Thousands of Atoms) is both a method and a computer program implementation, to perform electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids.  Compiled with Intel 10 FORTRAN and linked against MPICH2 (1.0.4p1).  If you wish to use this software, you should obtain you own license.
  • ECCE/NWCHEM
  • AutoDock4/MGLTools - AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.



Bioinformatics

 

 

Visualization

 

Note:  To use the visualization software described below on a remote Windows PC will require you to install Xming X-server or the equivalent.  Some of these applications require OpenGL and will not work with Xming and some video boards.  If a graphical application does not work remotely, you will need to physically login to the frontend.

 

  • XCrySDen – (type xcrysden to start) - Useful for viewing Espresso and other input and output files.
  • PWGui – (type pwgui to start) - A graphical interface for Espresso, useful for building input files and submitting Espresso jobs interactively.
  • Rasmol – (type rasmol_32BIT to start)
  • VMD – (type vmd to start) – Useful for viewing NAMD, GROMACS and many output file formats.
  • MGLTools – (type adt to start) - Graphical front-end for setting up and running AutoDock4 simulations interactively.

 


Compilers

The following compilers are available on both the Penrose cluster.  In addition, there are a variety of other programming languages available including python and perl.

 

·         C/C++ Compilers

·         GNU: gcc / g++

·         Intel:  icc 10.1

·         Fortran Compilers

·         GNU: g77 (for FORTRAN 77)

·         Intel:  ifort 10.1 (FORTRAN 77, 90, 95)


Accessing Penrose Cluster and Graphical Desktop Environment

 

SSH (Secure Shell) can be used to connect to the penrose cluster.  Once connected to the front end node (penrose), you can begin your computations.  If you would like to use graphical applications you will need to install NXClient for Windows or Linux on your computer.

 

 

Access to terminal via SSH (putty)

1) Download and install Putty

 

Putty

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

 

 

2) Configure Putty.

Run the PuTTY Windows SSH client, and enter 'penrose.nsm.iup.edu' as the Host Name as shown below.  Enter a name to call this session such as ‘Penrose-VNC’ and click the save button.

 


 


Next, go back to the [Session] window, click on [Save] and then click on the [Open] button.   Log in to penrose using your password and username.

Your tunnel is now active. You do not need to run anything else in this putty terminal window.   You are connected to the penrose cluster and could begin computing at this point.  If you would like to run X windows and graphical applications install NXclient as outlined below.

 

Accessing Graphical Applications on the Penrose Cluster

1)    Download NXclient from following website (http://www.nomachine.com/download.php) appropriate for your computer.

2)    Email Dr. Carl LeBlond for key required to access the penrose cluster.

3)    Install NXClient.

Use the following settings:

a) Host:  penrose.nsm.iup.edu

b) Desktop:  unix/gnome

c) Choose appropriate network setting (e.g. LAN (if your on campus), ADSL (if you are setting it up at home and you have DSL connection)

 

4)    Install the key (Click Key/Import and point to the key)

5)    Login to cluster using your login and password assigned.

Note:  The performance of the NX client depends on your graphics card and network connection


SGE (Sun Grid Engine)

This section describes the Sun Grid Engine 6.0 queue system on the Penrose Linux cluster and provides an introduction to help users get started using SGE.  For more details, consult the appropriate man pages. The sge_intro man page (type "man sge_intro") gives a brief description of all of the SGE commands. There is only one queue and the user interacts with this queue by the commands summarized below.  Typically a job command file is prepared and submitted to the queue with the qsub command.

qsub - Submit scripts to queue

  • -cwd Run the job from the current working directory
  • -v Pass the variable VAR (-V passes all variables)
  • -o Redirect standard output (Default: Home directory)
  • -o Redirect standard output (Default: Home directory)

       e.g. To submit the serial gromacs job command file type;

   qsub gromacs-ser

qdel – Delete job from queue

       e.g. To delete job with job ID of 201 type;

   qdel 201

qstat - Show job/queue status

  • no arguments Show currently running/pending jobs
  • -f Show full listing of all queues
  • -j Shows detailed information on pending/running job
  • -U Shows current jobs by user

qhost - Show job/host status

  • no arguments Show a table of all execution hosts and information about their configuration
  • -l attr=val Show only certain hosts
  • -j Shows detailed information on pending/running jobs
  • -q Shows detailed information on queues at each host

Job Command Files

 

An SGE job has to be defined by a job command file.  There are many options available for job command files.   An example job command file which will run the gromacs programs, grompp and mdsim, is provide as an example below.   This file will run the job on 2 nodes in parallel using mpi.  The command jobs error and output files will be written to the current working directory and would be called gromacs-par.errout and gromacs-par.out.

 

Example job command file: gromacs-par
Text Box: #!/bin/bash
#$ -cwd			
#$ -pe mpi 2
#$ -N gromacs-par 
#$ -S /bin/bash 
#$ -l h_rt=24:00:00  
#$ -e gromacs-par.errout
#$ -o gromacs-par.out 
#$ -M  youremail@iup.edu

/opt/Bio/gromacs/bin/grompp -np $NSLOTS -v  
/opt/mpich/gnu/bin/mpirun -np $NSLOTS \
-machinefile $TMP/machines -v /opt/Bio/gromacs/bin/mdrun

Reviewing Job Output

When a job from the SGE queue has completed, the job command error and output files will be available for you to review.  These are standard ASCII text files which may be viewed with the cat or more commands.  They could also be edited with an editor such as vi or emacs.  To learn more about these commands see there man pages ((type "man vi").  There are various applications for reviewing output from computational chemistry packages.  See the Visualization software overview and Application Specific Notes for details of there operation.


Application Specific Notes

 

GAMESS

Submitting a GAMESS job to SGE with GAMESS shell script:

 

1)    Construct input file - The input file contains information about system and control parameter, type of calculation and information concerning the molecular geometry and charge.  An input file can easily be prepared with the Linux program, gabedit, or using the Windows application, Chemcraft.  You could also prepare the file with a text editor (i.e. vi or emacs on Unix systems or Notepad in Windows), however gabedit and Chemcraft allow you to construct molecules and obtain there coordinates.

 

2)    Submit job – A script file named gamess should be invoked.  This script will start the GAMESS command shell and should be started from your current directory (see below).  From this shell you can submit GAMESS jobs to the SGE queue.

 

Text Box: GAMESS COMMANDS
  1. gamess fname 
*fname is the name of the input file.
  2. menu This command will redisplay this menu at any time.
  3. stop This command will allow you to exit the games shell.

 ** For parallel execution use ‘gamess –n X fname’
    Where X is the number of nodes

gamess>

 

 

 

 

For serial execution type

gamess>gamess inputfilename    

Note:  You should not include the .inp extension

 

For parallel execution on 4 nodes you would type the following

gamess>gamess –n 4 inputfilename

 

3)    Analysis and visualization - The output from the GAMESS job will be located in the text file, inputfilename.out, in your current directory.  The output can be visualized with either molden or VMD locally or using a remote X server.  You can expect very poor performance when visualizing with a remote X server.


GROMACS

 

GROMACS is a collection of programs for molecular dynamic simulations.  In the simplest case you would prepare a topology (.top) and a molecular dynamics parameter file (.mdp) before executing the program.  The topology file (.top) is first converted to a binary topology file (.tpr) with the command grompp.  Once you have created the binary topology file, the molecular dynamics simulation can be initiated with the mdrun command.  Read the getting started section of the manual and work your way through the examples given here http://www.gromacs.org/documentation/reference/online/getting_started.html.

 

You should run these from the command prompt in the directory where your topology and parameter files are located.  An example of this procedure is given below.  You could also submit these to SGE queue for parallel or serial execution with the gromacs-par and gromacs-ser scripts.

 

Text Box: Example:  Copying GROMACS examples and tutorials to home directory and executing from the Unix command prompt.

First copy tutor directory from the GROMACS installation to your home directory

[cleblond@penrose ~]$cp –r /opt/Bio/gromacs/share/gromacs/tutor ~

Change to the first demo directory called water:
 
[cleblond@penrose ~]$cd tutor/water

Convert topology file to binary topology file:

[cleblond@penrose ~]$grompp –v –np 2

Initiate molecular dynamic simulation:

[cleblond@penrose ~]$mdrun –v –np 2

Notes:  The –v options means verbose (i.e. print all output), while the –np 2 indicates the simulation will run on two computer nodes.  When run in this way the first computer is the front-end server node and the second would be compute-0-0.  You should be warned that this is not the typical way in which to run programs on the cluster, but can be used for the demo and practice.  If your job does not complete before you logout, the simulation will halt.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Text Box: Example:  Submitting parallel and serial GROMACS jobs to SGE queue.

For parallel execution:

[cleblond@penrose ~]$qsub gromacs-par

For serial execution:

[cleblond@penrose ~]$qsub gromacs-ser

Note: When jobs are submitted to he SGE queue, the computations are processed on the compute nodes only.  This has two advantages; 1) It does not take resources away for the front end server node and 2) you can logout and your simulations will continue processing.

 

 

 

 


NAMD

To run a NAMD job requires at least four files.   A CHARMM force field file, a X-PLOR format PSF file describing the molecular geometry, an initial coordinate file in PDB format and a NAMD configuration file.  Please refer to the NAMD user guide for more detailed information on constructing these files.

 

Text Box: Example:  Copying NAMD example files to your home directory and executing in parallel or serial.
Make a dir to hold the input files for demo
[cleblond@penrose ~]$ mkdir namd-demo
Change to namd-demo directory
[cleblond@penrose ~]$ cd namd-demo
Copy alanin demo input files to namd-demo directory
[cleblond@penrose ~]$ cp /opt/NAMD/NAMD_2.6_Source/src/alani* \ ~/namd-demo
A parallel or serial NAMD run is initiated using the SGE command files, namd-par and namd-ser.  You must edit these files (e.g. vi namd-par) and change the name alanin to match your input files.  Since you are running the alanin example you can leave the file unchanged for now.  You must run the job form within the namd-test directory.
To submit to the queue as parallel job type:
[cleblond@penrose ~]$ qsub namd-par
To submit to the queue as serial job type:
[cleblond@penrose ~]$ qsub namd-ser
.


 

Espresso-3.2/PWscf

 

Running an Espresso job requires an input file (e.g. filename.in).  This input file can either be constructed by hand (i.e. vi or emacs) or using the the program pwgui.  The SGE-command file, pwsub-par, should be used for submission of jobs.  The file should be edited to point to your input file and to reflect the number of processors you desire to run on.  Output files can be viewed with the program XCrySDen (type – xcrysden).

 

Text Box: Copying Espresso demo’s and running the examples interactively.
Make a directory to hold the input files for demo and for temporary files;
[cleblond@penrose ~]$ mkdir ~/PWscf
[cleblond@penrose ~]$ mkdir ~/tmp
Change to PWscf directory;
[cleblond@penrose ~]$ cd PWscf
Copy demo examples to PWscf directory;
[cleblond@penrose ~]$ cp -r /opt/espresso-3.2/examples ~/PWscf
Run the example files on the front-end node (i.e. to run example01);
[cleblond@penrose ~]$ cd examples/example01
[cleblond@penrose ~]$ ./run_example
Note:  There are over 30 examples to choose from.  Read the README file in the examples directory for more details.  More advanced users should continue to the next section for parallel execution.


 

The above example scripts will run interactively on the head node.  If you wish to submit jobs to the SGE queue, use the pwscf4 scripts (type pwscf4).  Read the Espresso manual for definitions of X and Y.

PWSCF COMMANDS

 1. pwscf fname

   *fname is the name of the input file.

 2. menu This command will redisplay this menu at any time.

 3. stop This command will allow you to exit the PWSCF shell.

** For parallel execution use ‘pwscf –n X –npool Y fname’

   Where X is the number of processors and Y is the number of pools

 

PWscf>

 
 

 

 

 

 

 


Text Box: Example:  Submitting  Espresso Batch Jobs using SGE Scripts
A parallel or serial Espresso run is initiated using the SGE command file, pwsub-par.  You must edit this file (e.g. vi) and change the name of the input file.  Also, you should change the line near the top to reflect the number of processors you plan to run on (e.g.  #$ -pe mpi 2). This would indicate to run on two processors. To submit to the queue as type:
[cleblond@penrose ~]$ qsub pwsub-par
For a serial job type, you would edit the script such that (#$ -pe mpi 1)

 

 

 

 

 

 

 

 

 

 

 

 

 

Alternatively you could also write your own script to submit jobs to the SGE queue.  The pwsub-par script is an example.  If you would like other example scripts for optimizing K-points, Celldm or energy cutoffs, contact Dr. LeBlond.

 

AutoDock4/MDLTools

 

AutoDock4 (and AutoGrid4) must be run in the directory where the rigid macromolecule, ligand and parameter files are to be found.  Also if you plan to use ADT for interactive jobs you must start ADT from the directory where these files are located.  AutoDock and AutoGrid are not parallel applications (i.e. it will run in serial on a single node).  Begin by working through the tutorial at the link provided below.

 

 

Text Box: Copying AutoDock tutorials and running interactively with MGLTools (ADT).
Copy tutorials to home directory;
[cleblond@penrose ~]$ cp -r /share/apps/autodocksuite-4.0.1/tutorial4 ~
Change to tutorial4/reults4 directory;
[cleblond@penrose ~]$ cd tutorial4/Results4
Start adt
[cleblond@penrose ~]$ adt&
You could now begin the step by step tutorial located at
http://autodock.scripps.edu/faqs-help/tutorial/using-autodock-4-with autodocktools/UsingAutoDock4WithADT_1.4.5b.pdf

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Text Box: Example:  Submitting AutoDock Batch Jobs using SGE
A batch AutoDock4 run is initiated using the SGE command file, autodock-bat.  You must edit this file (e.g. vi) and change the name of the input file.
[cleblond@penrose ~]$ qsub autodock-bat

 

 

 

 

 

 


 Links to Useful Software and Documentation

 

Provided here are links to the software packages websites. 

 

Cluster Software

ROCKS Cluster Software - http://www.rocksclusters.org

 

SSH, FTP and X servers

Putty SSH client software - http://www.chiark.greenend.org.uk/~sgtatham/putty/

 

PortaPutty – A portable version (i.e. can run from USB drive) - http://socialistsushi.com/portaputty

 

Filezilla FTP client software - http://filezilla-project.org/

 

Filezilla Portable –A portable version of Filezilla http://portableapps.com/apps/internet/filezilla_portable

 

XMing X server - http://www.straightrunning.com/XmingNotes/

 

Computational Chemistry Software

 

ROCKS Bio Roll Software - http://www.rocksclusters.org/roll-documentation/bio/4.3/

 

GAMESS  - http://www.msg.ameslab.gov/GAMESS/GAMESS.html

 

GROMACS - http://www.gromacs.org.  See the getting started with GROMACS tutorials at http://www.gromacs.org/documentation/reference/online/getting_started.html.

 

NAMD - http://www.ks.uiuc.edu/Research/namd/.

 

Espresso-3.2 - http://www.pwscf.org/


AutoDock4/MGLTools - http://mgltools.scripps.edu/


Acknowledgements

 

CRL is grateful to John Draganosky, Joe Shyrok, Tom Kirkpatrick and Paul Grieggs, for equipment donations and configuration help.  Special thanks to NEETC for equipment donations.