Hi Mark,
The data distribution time that is highly variable is an "all reduce" in
gb_ene.fpp for some Born-radii-related data. The actual force distribution
is showing up as pretty trivial in this overall calculation. I probably
should have guessed this - GB is an example of a method that is fairly
compute-bound, with some specific requirements that put the bottlenecks in
different places. Good to know that this particular mod is absolutely not a
problem, at least for GB, though; thanks for bothering. The variability you
do see in the all reduce efficiency is indeed pretty bad, but I am guessing
the cluster is periodically hammered or some such; "all reduce"'s are just
plain bad news, but at least currently unavoidable.
Regards - Bob
----- Original Message -----
From: "Mark Williamson" <mjw.sdsc.edu>
To: "AMBER Developers Mailing List" <amber-developers.ambermd.org>
Sent: Monday, April 19, 2010 2:28 PM
Subject: Re: [AMBER-Developers] Suggestions for dealing with mpich2-1.2.1p1
> Robert Duke wrote:
>> I would try Alexey Onofriev's monster GB benchmark if you really want to
>> see what happens. This thing has lots of atoms so the arrays are big. I
>> expect Ross has it lying around; if not I can forward you a copy but will
>> have to dig out the mdin from another machine (it is a big
>> prmtop/inpcrd - at least 2 MB zipped).
>
> Ok, I've got some results. First off, I applied this modification to PMEMD
> 10 since I had a copy of it handy on the Triton cluster and a compile
> method all in place. For the testcase, I used the test at
> amber11/test/cuda/nucleosome. The mdin from Run_md.1 was used, but changed
> to run for 1000 steps. Two binaries of PMEMD10 were compiled with and
> without the patch mentioned in this thread. Ten runs for each binary were
> carried out over 256 processors. The full results can be found at:
>
> http://www.wmd-lab.org/mjw/for_amber_dev/mpich2_1.2.0_pmemd_testing/results/
>
> Looking at the logfile outputs for the two binaries and in the "GB
> NonBond Parallel Profiling - NonSetup CPU Seconds:" using the avg value
> as a metric here:
>
> O D
> f i
> R f s T
> T a D D t o
> a d i i r t
> s i a a i a
> k i g g b l
> ------------------------------------------------------
> 0 8.8 13.8 131.4 20.0 174.0
> <..snip..>
> 255 8.6 13.5 129.4 22.2 173.6
> ------------------------------------------------------
> avg 8.6 13.5 129.2 22.3 173.6 <-----
> ------------------------------------------------------
>
>
> NORMAL
> ======
>
> grep avg *logfile* | grep -v 0.0 | awk '{print $2,$3,$4,$5,$6,$7}'
>
>
> avg 8.6 13.5 128.8 25.3 176.3
> avg 8.6 13.5 128.9 24.4 175.4
> avg 8.6 13.5 128.7 22.5 173.3
> avg 8.6 13.5 128.8 106.8 257.7
> avg 8.6 13.5 128.7 122.5 273.3
> avg 8.6 13.5 128.6 21.8 172.4
> avg 8.6 13.5 128.8 24.7 175.6
> avg 8.6 13.5 128.6 35.7 186.4
> avg 8.6 13.5 128.8 25.8 176.8
> avg 8.6 13.5 128.8 23.8 174.8
>
>
> MODDED
> ======
>
> avg 8.6 13.5 129.1 113.5 264.7
> avg 8.6 13.5 129.2 22.3 173.6
> avg 8.6 13.5 129.2 22.1 173.4
> avg 8.6 13.5 129.3 19.8 171.2
> avg 8.6 13.5 129.2 23.9 175.3
> avg 8.6 13.5 129.2 22.0 173.3
> avg 8.6 13.5 129.3 22.0 173.4
> avg 8.6 13.5 129.1 21.5 172.7
> avg 8.6 13.5 129.2 21.1 172.5
> avg 8.6 13.5 129.2 125.8 277.0
>
> I'm putting the random jumps in "Distrib" values for both tests, down to
> transient cluster traffic issues. The Modded "OffDiag" value seems to be
> on average, 0.5 seconds greater than the Normal one. I think my change has
> not had a significantly detrimental effect on the performance. Would you
> agree with this?
>
> regards,
>
> Mark
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Apr 19 2010 - 12:30:03 PDT