[AMBER-Developers] beta of pmemd.cuda.MPI available in git tree from Ross Walker on 2010-08-20 (Amber Developers Archive Aug 2010)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 20 Aug 2010 18:32:22 -0700

Hi All,

Thanks largely to Scott Le Grand's tireless work we now have a MPI version
of the CUDA code available in the AMBER Git tree. You can build this as
follows - note you MUST USE an MPI installation that supports MPI v2.0 - an
example would be mpich2.

cd $AMBERHOME/AmberTools/src
./configure -cuda -mpi intel
cd ../../src/
make cuda_parallel

This will build the executable pmemd.cuda_SPDP.MPI and a link to it called
pmemd.cuda.MPI

You can then test this with:

cd $AMBERHOME/test/
export DO_PARALLEL='mpirun -np 2'
./test_amber_cuda_parallel.sh -1 SPDP

You can also build other precision models with ./configure -cuda_DPDP -mpi
intel (etc...)

Right now if you just select the default gpu option (which is the same as
-gpu -1) then the code will iterate through the available gpus on each node
defined by your machinefile. Thus if you machine file has:

n0
n1

it will use first GPU on each node. If it has:

n0
n0

Then it will use the first and second gpus on node0. You can also specify
the gpu id's on the command line using the -gpu X command line option,
however, you will need to provide command line arguments for every mpi
instance you are firing up with mpirun - check the docs for the mpi instance
for how to do this.

I am interested in feedback and any performance numbers from people who,
considerably richer than me and so have access to hardware I don't have.
Dave C for example?

Right now the sweet spot is really using 1 GPU per node with QDR IB between
nodes. However, it should run fine on GPUs within a node - but please let me
know the performance you see.

Current, known issues are:

1) ntt=3 - these test cases will appear to fail because the random number
streams are different.
2) NTP (ntb=2) is not currently working in parallel. - the virial is broken.
3) Boxes with angles Alpha/=Beta/=Gamma/=90.0 currently do not work in
parallel - the virial is broken.
4) Minimization with pme does not work in parallel. There is a race
condition so for now you will need to comment out the .min test cases in the
Makefile for pme.
5) There will be lots of roundoff in the test cases for SPSP and SPDP
because of the sheer number of summations going on. Consider the DPDP
version to be the standard for testing.

Let me know how you get on. Once we have the bugs ironed out I plan to
release this as a patch against amber11.

All the best
Ross

/\
\/
|\oss Walker

---------------------------------------------------------
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Adjunct Assistant Professor |
| Dept. of Chemistry and Biochemistry |
| University of California San Diego |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
---------------------------------------------------------

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Aug 20 2010 - 19:00:03 PDT