Looks like it's related to the -DNO_NTT3_SYNC flag. I got the same garbage
results using OpenMPI with -DNO_NTT3_SYNC, but turning off that flag reduced
me to what Ross was getting. I'll see if this was the cause of the problems
that Bill was seeing before.
One thing worth considering is to make that flag fatal for CUDA builds, but
it's not really documented, anyway. In any case, I thought I would follow
up. I'll report back on the performance of mvapich2 (the default on
Lincoln) without -DNO_NTT3_SYNC.
All the best,
Jason
On Sun, Dec 5, 2010 at 2:25 PM, Scott Le Grand <SLeGrand.nvidia.com> wrote:
> Can you try installing the latest OpenMPI and use that instead? I am
> seeing all sorts of sensitivity to MPI libraries and even specific builds of
> them.
>
>
> -----Original Message-----
> From: Jason Swails [mailto:jason.swails.gmail.com]
> Sent: Sunday, December 05, 2010 11:13
> To: AMBER Developers Mailing List
> Subject: Re: [AMBER-Developers] more pmemd.cuda.MPI issues
>
> Hi Ross,
>
> A couple differences between our config.h files. It doesn't appear that
> you
> set MPI_HOME. Where you have -I/include, I have
> -I/usr/local/mvapich2-1.2-intel-ofed-1.2.5.5/include . Also, I set
> -DNO_NTT3_SYNC, would this break things? Using my config.h file, I'm
> getting 20 ns/day in serial (compared to your 23), and in parallel, I was
> getting junk at a rate of ~35 ns/day, which is considerably different than
> your 23.
>
> I'm trying again without -DNO_NTT3_SYNC, but I'm curious as to what affect
> not setting MPI_HOME has on your build, although the fortran compiler
> should
> be picking up the mpif.h includes... Is MPI_HOME completely unnecessary
> for
> pmemd?
>
> Thanks!
> Jason
>
> On Sat, Dec 4, 2010 at 11:33 PM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
> > Hi Jason,
> >
> > Works fine for me. Files I used to build along with my environmental
> config
> > files are attached.
> >
> > I did.
> >
> > tar xvjf AmberTools-1.4.tar.bz
> > tar xvjf Amber11.tar.bz2
> > cd $AMBERHOME
> > wget http://ambermd.org/bugfixes/AmberTools/1.4/bugfix.all
> > patch -p0 < bugfix.all
> > rm -f bugfix.all
> > wget http://ambermd.org/bugfixes/11.0/bugfix.all
> > wget http://ambermd.org/bugfixes/apply_bugfix.x
> > chmod 755 apply_bugfix.x
> > ./apply_bugfix.x bugfix.all
> > cd AmberTools/src/
> > ./configure -cuda -mpi intel
> > cd ../../src
> > make cuda_parallel
> >
> > cd ~/
> > mkdir parallel_fail
> > cd parallel_fail
> > tar xvzf ../parallel_fail.tgz
> >
> > qsub -I -l walltime=0:30:00 -q Lincoln_debug
> >
> > cd parallel_fail
> >
> > mpirun -np 2 ~/amber11/bin/pmemd.cuda.MPI -O -p hairpin_0.mbondi2.parm7
> > -ref
> > hairpin_0.mbondi2.heat.rst7 -c hairpin_0.mbondi2.heat.rst7 </dev/null
> >
> > Output file is attached.
> >
> > All the best
> > Ross
> >
> > > -----Original Message-----
> > > From: Jason Swails [mailto:jason.swails.gmail.com]
> > > Sent: Saturday, December 04, 2010 3:21 PM
> > > To: AMBER Developers Mailing List
> > > Subject: [AMBER-Developers] more pmemd.cuda.MPI issues
> > >
> > > Hello,
> > >
> > > I ran a GB simulation on NCSA Lincoln using 2 GPUs with a standard
> > nucleic
> > > acid system, and every energy term was ***********. Running in serial,
> > all
> > > results were reasonable. I've attached the mdin, restart, and prmtop
> > files
> > > for this error.
> > >
> > > All the best,
> > > Jason
> > >
> > > --
> > > Jason M. Swails
> > > Quantum Theory Project,
> > > University of Florida
> > > Ph.D. Graduate Student
> > > 352-392-4032
> >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> >
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Graduate Student
> 352-392-4032
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may
> contain
> confidential information. Any unauthorized review, use, disclosure or
> distribution
> is prohibited. If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Dec 05 2010 - 14:00:04 PST