Re: [AMBER-Developers] more pmemd.cuda.MPI issues

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sun, 5 Dec 2010 20:25:08 -0800

Hi Jason,

Okay, that is VERY weird since the code related to NO_NTT3_SYNC is not used
when running on GPUs and so should have no effect whatsoever. On the GPU it
uses its own random number stream. My guess is thus that the setting of
different random seeds on each MPI thread when using NO_NTT3_SYNC is what is
causing the problem. I'll modify this now so it has no effect when running
on the GPU.

All the best
Ross

> -----Original Message-----
> From: Jason Swails [mailto:jason.swails.gmail.com]
> Sent: Sunday, December 05, 2010 1:38 PM
> To: AMBER Developers Mailing List
> Subject: Re: [AMBER-Developers] more pmemd.cuda.MPI issues
>
> Looks like it's related to the -DNO_NTT3_SYNC flag. I got the same
garbage
> results using OpenMPI with -DNO_NTT3_SYNC, but turning off that flag
> reduced
> me to what Ross was getting. I'll see if this was the cause of the
problems
> that Bill was seeing before.
>
> One thing worth considering is to make that flag fatal for CUDA builds,
but
> it's not really documented, anyway. In any case, I thought I would follow
> up. I'll report back on the performance of mvapich2 (the default on
> Lincoln) without -DNO_NTT3_SYNC.
>
> All the best,
> Jason
>
> On Sun, Dec 5, 2010 at 2:25 PM, Scott Le Grand <SLeGrand.nvidia.com>
> wrote:
>
> > Can you try installing the latest OpenMPI and use that instead? I am
> > seeing all sorts of sensitivity to MPI libraries and even specific
builds of
> > them.
> >
> >
> > -----Original Message-----
> > From: Jason Swails [mailto:jason.swails.gmail.com]
> > Sent: Sunday, December 05, 2010 11:13
> > To: AMBER Developers Mailing List
> > Subject: Re: [AMBER-Developers] more pmemd.cuda.MPI issues
> >
> > Hi Ross,
> >
> > A couple differences between our config.h files. It doesn't appear that
> > you
> > set MPI_HOME. Where you have -I/include, I have
> > -I/usr/local/mvapich2-1.2-intel-ofed-1.2.5.5/include . Also, I set
> > -DNO_NTT3_SYNC, would this break things? Using my config.h file, I'm
> > getting 20 ns/day in serial (compared to your 23), and in parallel, I
was
> > getting junk at a rate of ~35 ns/day, which is considerably different
than
> > your 23.
> >
> > I'm trying again without -DNO_NTT3_SYNC, but I'm curious as to what
> affect
> > not setting MPI_HOME has on your build, although the fortran compiler
> > should
> > be picking up the mpif.h includes... Is MPI_HOME completely unnecessary
> > for
> > pmemd?
> >
> > Thanks!
> > Jason
> >
> > On Sat, Dec 4, 2010 at 11:33 PM, Ross Walker <ross.rosswalker.co.uk>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > Works fine for me. Files I used to build along with my environmental
> > config
> > > files are attached.
> > >
> > > I did.
> > >
> > > tar xvjf AmberTools-1.4.tar.bz
> > > tar xvjf Amber11.tar.bz2
> > > cd $AMBERHOME
> > > wget http://ambermd.org/bugfixes/AmberTools/1.4/bugfix.all
> > > patch -p0 < bugfix.all
> > > rm -f bugfix.all
> > > wget http://ambermd.org/bugfixes/11.0/bugfix.all
> > > wget http://ambermd.org/bugfixes/apply_bugfix.x
> > > chmod 755 apply_bugfix.x
> > > ./apply_bugfix.x bugfix.all
> > > cd AmberTools/src/
> > > ./configure -cuda -mpi intel
> > > cd ../../src
> > > make cuda_parallel
> > >
> > > cd ~/
> > > mkdir parallel_fail
> > > cd parallel_fail
> > > tar xvzf ../parallel_fail.tgz
> > >
> > > qsub -I -l walltime=0:30:00 -q Lincoln_debug
> > >
> > > cd parallel_fail
> > >
> > > mpirun -np 2 ~/amber11/bin/pmemd.cuda.MPI -O -p
> hairpin_0.mbondi2.parm7
> > > -ref
> > > hairpin_0.mbondi2.heat.rst7 -c hairpin_0.mbondi2.heat.rst7 </dev/null
> > >
> > > Output file is attached.
> > >
> > > All the best
> > > Ross
> > >
> > > > -----Original Message-----
> > > > From: Jason Swails [mailto:jason.swails.gmail.com]
> > > > Sent: Saturday, December 04, 2010 3:21 PM
> > > > To: AMBER Developers Mailing List
> > > > Subject: [AMBER-Developers] more pmemd.cuda.MPI issues
> > > >
> > > > Hello,
> > > >
> > > > I ran a GB simulation on NCSA Lincoln using 2 GPUs with a standard
> > > nucleic
> > > > acid system, and every energy term was ***********. Running in
> serial,
> > > all
> > > > results were reasonable. I've attached the mdin, restart, and
prmtop
> > > files
> > > > for this error.
> > > >
> > > > All the best,
> > > > Jason
> > > >
> > > > --
> > > > Jason M. Swails
> > > > Quantum Theory Project,
> > > > University of Florida
> > > > Ph.D. Graduate Student
> > > > 352-392-4032
> > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > >
> >
> >
> > --
> > Jason M. Swails
> > Quantum Theory Project,
> > University of Florida
> > Ph.D. Graduate Student
> > 352-392-4032
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> >
----------------------------------------------------------------------------
-------
> > This email message is for the sole use of the intended recipient(s) and
may
> > contain
> > confidential information. Any unauthorized review, use, disclosure or
> > distribution
> > is prohibited. If you are not the intended recipient, please contact
the
> > sender by
> > reply email and destroy all copies of the original message.
> >
> >
----------------------------------------------------------------------------
-------
> >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
>
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Graduate Student
> 352-392-4032
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Dec 05 2010 - 20:30:02 PST
Custom Search