Re: [AMBER-Developers] Suggestions for dealing with mpich2-1.2.1p1

From: Mark Williamson <mjw.sdsc.edu>
Date: Mon, 19 Apr 2010 11:28:54 -0700

Robert Duke wrote:
> I would try Alexey Onofriev's monster GB benchmark if you really want to
> see what happens. This thing has lots of atoms so the arrays are big.
> I expect Ross has it lying around; if not I can forward you a copy but
> will have to dig out the mdin from another machine (it is a big
> prmtop/inpcrd - at least 2 MB zipped).

Ok, I've got some results. First off, I applied this modification to
PMEMD 10 since I had a copy of it handy on the Triton cluster and a
compile method all in place. For the testcase, I used the test at
amber11/test/cuda/nucleosome. The mdin from Run_md.1 was used, but
changed to run for 1000 steps. Two binaries of PMEMD10 were compiled
with and without the patch mentioned in this thread. Ten runs for each
binary were carried out over 256 processors. The full results can be
found at:

http://www.wmd-lab.org/mjw/for_amber_dev/mpich2_1.2.0_pmemd_testing/results/

Looking at the logfile outputs for the two binaries and in the "GB
NonBond Parallel Profiling - NonSetup CPU Seconds:" using the avg value
as a metric here:

                                  O D
                                  f i
              R f s T
    T a D D t o
    a d i i r t
    s i a a i a
    k i g g b l
------------------------------------------------------
    0 8.8 13.8 131.4 20.0 174.0
<..snip..>
  255 8.6 13.5 129.4 22.2 173.6
------------------------------------------------------
  avg 8.6 13.5 129.2 22.3 173.6 <-----
------------------------------------------------------


NORMAL
======

  grep avg *logfile* | grep -v 0.0 | awk '{print $2,$3,$4,$5,$6,$7}'


avg 8.6 13.5 128.8 25.3 176.3
avg 8.6 13.5 128.9 24.4 175.4
avg 8.6 13.5 128.7 22.5 173.3
avg 8.6 13.5 128.8 106.8 257.7
avg 8.6 13.5 128.7 122.5 273.3
avg 8.6 13.5 128.6 21.8 172.4
avg 8.6 13.5 128.8 24.7 175.6
avg 8.6 13.5 128.6 35.7 186.4
avg 8.6 13.5 128.8 25.8 176.8
avg 8.6 13.5 128.8 23.8 174.8


MODDED
======

avg 8.6 13.5 129.1 113.5 264.7
avg 8.6 13.5 129.2 22.3 173.6
avg 8.6 13.5 129.2 22.1 173.4
avg 8.6 13.5 129.3 19.8 171.2
avg 8.6 13.5 129.2 23.9 175.3
avg 8.6 13.5 129.2 22.0 173.3
avg 8.6 13.5 129.3 22.0 173.4
avg 8.6 13.5 129.1 21.5 172.7
avg 8.6 13.5 129.2 21.1 172.5
avg 8.6 13.5 129.2 125.8 277.0

I'm putting the random jumps in "Distrib" values for both tests, down to
transient cluster traffic issues. The Modded "OffDiag" value seems to be
on average, 0.5 seconds greater than the Normal one. I think my change
has not had a significantly detrimental effect on the performance. Would
you agree with this?

regards,

Mark



_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Apr 19 2010 - 11:30:02 PDT
Custom Search