Re: [AMBER-Developers] more pmemd.cuda.MPI issues

From: Scott Le Grand <SLeGrand.nvidia.com>
Date: Sun, 5 Dec 2010 11:25:38 -0800

Can you try installing the latest OpenMPI and use that instead? I am seeing all sorts of sensitivity to MPI libraries and even specific builds of them.


-----Original Message-----
From: Jason Swails [mailto:jason.swails.gmail.com]
Sent: Sunday, December 05, 2010 11:13
To: AMBER Developers Mailing List
Subject: Re: [AMBER-Developers] more pmemd.cuda.MPI issues

Hi Ross,

A couple differences between our config.h files. It doesn't appear that you
set MPI_HOME. Where you have -I/include, I have
-I/usr/local/mvapich2-1.2-intel-ofed-1.2.5.5/include . Also, I set
-DNO_NTT3_SYNC, would this break things? Using my config.h file, I'm
getting 20 ns/day in serial (compared to your 23), and in parallel, I was
getting junk at a rate of ~35 ns/day, which is considerably different than
your 23.

I'm trying again without -DNO_NTT3_SYNC, but I'm curious as to what affect
not setting MPI_HOME has on your build, although the fortran compiler should
be picking up the mpif.h includes... Is MPI_HOME completely unnecessary for
pmemd?

Thanks!
Jason

On Sat, Dec 4, 2010 at 11:33 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Jason,
>
> Works fine for me. Files I used to build along with my environmental config
> files are attached.
>
> I did.
>
> tar xvjf AmberTools-1.4.tar.bz
> tar xvjf Amber11.tar.bz2
> cd $AMBERHOME
> wget http://ambermd.org/bugfixes/AmberTools/1.4/bugfix.all
> patch -p0 < bugfix.all
> rm -f bugfix.all
> wget http://ambermd.org/bugfixes/11.0/bugfix.all
> wget http://ambermd.org/bugfixes/apply_bugfix.x
> chmod 755 apply_bugfix.x
> ./apply_bugfix.x bugfix.all
> cd AmberTools/src/
> ./configure -cuda -mpi intel
> cd ../../src
> make cuda_parallel
>
> cd ~/
> mkdir parallel_fail
> cd parallel_fail
> tar xvzf ../parallel_fail.tgz
>
> qsub -I -l walltime=0:30:00 -q Lincoln_debug
>
> cd parallel_fail
>
> mpirun -np 2 ~/amber11/bin/pmemd.cuda.MPI -O -p hairpin_0.mbondi2.parm7
> -ref
> hairpin_0.mbondi2.heat.rst7 -c hairpin_0.mbondi2.heat.rst7 </dev/null
>
> Output file is attached.
>
> All the best
> Ross
>
> > -----Original Message-----
> > From: Jason Swails [mailto:jason.swails.gmail.com]
> > Sent: Saturday, December 04, 2010 3:21 PM
> > To: AMBER Developers Mailing List
> > Subject: [AMBER-Developers] more pmemd.cuda.MPI issues
> >
> > Hello,
> >
> > I ran a GB simulation on NCSA Lincoln using 2 GPUs with a standard
> nucleic
> > acid system, and every energy term was ***********. Running in serial,
> all
> > results were reasonable. I've attached the mdin, restart, and prmtop
> files
> > for this error.
> >
> > All the best,
> > Jason
> >
> > --
> > Jason M. Swails
> > Quantum Theory Project,
> > University of Florida
> > Ph.D. Graduate Student
> > 352-392-4032
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>


-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Dec 05 2010 - 11:30:04 PST
Custom Search