Parallel PAW on the Cobalt-Cluster



The code:

The original PAW for the IBM included a parallel execution mode (basically for the IBM SP2) which mainly means a parallel FFT algorithm. It uses the the MPI-libarary for the communication between the nodes.
This has been ported to the DEC platform now (lots of credits to Serguei!!!!).

Instead of one 3D-FFT (performed by the fftw-library) the program uses a couple of 1D-FFT calls done in parallel on different nodes. In a single-node run this is slightly slower then the "serial" version.

The memory overhead for the parallel mode is about 20 MB per node. This means the memory requirements for each node can be estimated by dividing the serial memory req. by the number of nodes and adding 20 MB. Because of the heavy network access (especially for smaller systems) for the parallel FFT algorithm a significant speedup can only be reached for about up to 5 nodes. As a rule of thumb 4 nodes give a speedup of about 3.
 

Limitations:

Currently the parallel version does not allow continuum or QMMM calculations. Work on the parallelization of the continuum part is currently in progress. Stay tuned.
 

Runscripts/ Queueuing System:

The parallel version of the program essentially does not differ from the serial version (there are a number of scripts in the background that do the dirty work, of course). However, this ONLY works in the framework of the DQS-queueing system (So don't try to run interactive parallel jobs or anything like that :-)

The runscript paw_run  has been modified in a way that it runs the excutable name given in the environementvariable PAW_EXE.
This defaults to "paw", which means if this variable is not set a serial run will be performed.
If you want to run a parallel PAW calculation you will have to set this variable to "pawp" and specify the right switches for the queueing system (PLEASE read the excellent documentation for the COBALT cluster http://www.cobalt.chem.ucalgary.ca/cobalt/#ParallelJobs to understand the following example)

To illustrate that here an example to submit a job to the queue with the input file test_paw.1.inp in the directory
/usr/people/myself/test_it/. This works both with the runscript paw_run as with the startegy script paw_opt (bcause paw_opt uses paw_run).

To run it serial you would have to submit the following script with qsub to the queueu:

#!/usr/local/bin/tcsh
#$ -N Serial_PAW_Job
#$ -S /usr/local/bin/tcsh
#$ -j y
#$ -o OutputOfMySerialJob.out
#$ -l mem.ge.128 .and. PAW

cd /usr/people/myself/test_it/
/usr/people/program/paw/scripts/paw_run test_paw.1 > test_paw.1.echo

To run the same job in parallel on 3 nodes you would have to submit the following script to the queueu:

#!/usr/local/bin/tcsh
#$ -N Parallel_PAW_Job
#$ -S /usr/local/bin/tcsh
#$ -j y
#$ -o OutputOfMyParallelJob.out
#$ -l mem.ge.64 .and. PAW .and. PARALLEL .and. qty.eq.3

setenv PAW_EXE pawp

cd /usr/people/myself/test_it/
/usr/people/program/paw/scripts/paw_run test_paw.1 > test_paw.1.echo
 



Rochus Schmid    (26. 02. 98)