Removing this flag also fixed Bill's problem.
On Sun, Dec 5, 2010 at 4:38 PM, Jason Swails <jason.swails.gmail.com> wrote:
> Looks like it's related to the -DNO_NTT3_SYNC flag. I got the same garbage
> results using OpenMPI with -DNO_NTT3_SYNC, but turning off that flag reduced
> me to what Ross was getting. I'll see if this was the cause of the problems
> that Bill was seeing before.
>
> One thing worth considering is to make that flag fatal for CUDA builds, but
> it's not really documented, anyway. In any case, I thought I would follow
> up. I'll report back on the performance of mvapich2 (the default on
> Lincoln) without -DNO_NTT3_SYNC.
>
> All the best,
> Jason
>
>
> On Sun, Dec 5, 2010 at 2:25 PM, Scott Le Grand <SLeGrand.nvidia.com>wrote:
>
>> Can you try installing the latest OpenMPI and use that instead? I am
>> seeing all sorts of sensitivity to MPI libraries and even specific builds of
>> them.
>>
>>
>> -----Original Message-----
>> From: Jason Swails [mailto:jason.swails.gmail.com]
>> Sent: Sunday, December 05, 2010 11:13
>> To: AMBER Developers Mailing List
>> Subject: Re: [AMBER-Developers] more pmemd.cuda.MPI issues
>>
>> Hi Ross,
>>
>> A couple differences between our config.h files. It doesn't appear that
>> you
>> set MPI_HOME. Where you have -I/include, I have
>> -I/usr/local/mvapich2-1.2-intel-ofed-1.2.5.5/include . Also, I set
>> -DNO_NTT3_SYNC, would this break things? Using my config.h file, I'm
>> getting 20 ns/day in serial (compared to your 23), and in parallel, I was
>> getting junk at a rate of ~35 ns/day, which is considerably different than
>> your 23.
>>
>> I'm trying again without -DNO_NTT3_SYNC, but I'm curious as to what affect
>> not setting MPI_HOME has on your build, although the fortran compiler
>> should
>> be picking up the mpif.h includes... Is MPI_HOME completely unnecessary
>> for
>> pmemd?
>>
>> Thanks!
>> Jason
>>
>> On Sat, Dec 4, 2010 at 11:33 PM, Ross Walker <ross.rosswalker.co.uk>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > Works fine for me. Files I used to build along with my environmental
>> config
>> > files are attached.
>> >
>> > I did.
>> >
>> > tar xvjf AmberTools-1.4.tar.bz
>> > tar xvjf Amber11.tar.bz2
>> > cd $AMBERHOME
>> > wget http://ambermd.org/bugfixes/AmberTools/1.4/bugfix.all
>> > patch -p0 < bugfix.all
>> > rm -f bugfix.all
>> > wget http://ambermd.org/bugfixes/11.0/bugfix.all
>> > wget http://ambermd.org/bugfixes/apply_bugfix.x
>> > chmod 755 apply_bugfix.x
>> > ./apply_bugfix.x bugfix.all
>> > cd AmberTools/src/
>> > ./configure -cuda -mpi intel
>> > cd ../../src
>> > make cuda_parallel
>> >
>> > cd ~/
>> > mkdir parallel_fail
>> > cd parallel_fail
>> > tar xvzf ../parallel_fail.tgz
>> >
>> > qsub -I -l walltime=0:30:00 -q Lincoln_debug
>> >
>> > cd parallel_fail
>> >
>> > mpirun -np 2 ~/amber11/bin/pmemd.cuda.MPI -O -p hairpin_0.mbondi2.parm7
>> > -ref
>> > hairpin_0.mbondi2.heat.rst7 -c hairpin_0.mbondi2.heat.rst7 </dev/null
>> >
>> > Output file is attached.
>> >
>> > All the best
>> > Ross
>> >
>> > > -----Original Message-----
>> > > From: Jason Swails [mailto:jason.swails.gmail.com]
>> > > Sent: Saturday, December 04, 2010 3:21 PM
>> > > To: AMBER Developers Mailing List
>> > > Subject: [AMBER-Developers] more pmemd.cuda.MPI issues
>> > >
>> > > Hello,
>> > >
>> > > I ran a GB simulation on NCSA Lincoln using 2 GPUs with a standard
>> > nucleic
>> > > acid system, and every energy term was ***********. Running in
>> serial,
>> > all
>> > > results were reasonable. I've attached the mdin, restart, and prmtop
>> > files
>> > > for this error.
>> > >
>> > > All the best,
>> > > Jason
>> > >
>> > > --
>> > > Jason M. Swails
>> > > Quantum Theory Project,
>> > > University of Florida
>> > > Ph.D. Graduate Student
>> > > 352-392-4032
>> >
>> > _______________________________________________
>> > AMBER-Developers mailing list
>> > AMBER-Developers.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>> >
>> >
>>
>>
>> --
>> Jason M. Swails
>> Quantum Theory Project,
>> University of Florida
>> Ph.D. Graduate Student
>> 352-392-4032
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>> -----------------------------------------------------------------------------------
>> This email message is for the sole use of the intended recipient(s) and
>> may contain
>> confidential information. Any unauthorized review, use, disclosure or
>> distribution
>> is prohibited. If you are not the intended recipient, please contact the
>> sender by
>> reply email and destroy all copies of the original message.
>>
>> -----------------------------------------------------------------------------------
>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>
>
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Graduate Student
> 352-392-4032
>
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Dec 05 2010 - 15:00:04 PST