Re: amber-developers: Verlet update time and ntt=3 parallel scaling

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 6 May 2008 18:38:41 -0400

Hi Carlos -
This probably is more of an issue for a BG/L simply because the processors
have about 1/4 to 1/5 of the compute capability of most of the other cpu's
we deal with, being only at 700 MHz, so you have to add more cpu's to get
anything done, so scaling issues bite you worse, and this is a pure compute
scaling issue - ie., redundant cpu work, so the better interconnect of the
BG/L is not going to compensate for it. So, in that sense I guess, this is
something you may notice a bit more on a BG/L (all theoretical blather on my
part; not like I have done some rock solid comparisons, but it makes sense).
Regards - Bob

----- Original Message -----
From: "Carlos Simmerling" <carlos.simmerling.gmail.com>
To: <amber-developers.scripps.edu>
Sent: Tuesday, May 06, 2008 6:07 PM
Subject: Re: amber-developers: Verlet update time and ntt=3 parallel scaling


> Hi Bob,
> thanks for the comments. Yes I knew that this was an issue in principle,
> I just didn't realize it became the dominant factor in scaling even
> past 64 cpus.
> I was wondering if the BG is particularly slow at this calculation and
> don't
> have data points from other machines to know.
> Thanks for the reminder to all on the seeding, we're careful with that
> especially
> with things like replica exchange.
> thanks
> Carlos
>
> On Tue, May 6, 2008 at 5:42 PM, Robert Duke <rduke.email.unc.edu> wrote:
>> This is a very well known issue with ntt 3, caused by the nonscaling
>> nature
>> of the random number generator in use. To run separate rng's in each
>> process would potentially be theoretically unsound, so we run through the
>> sequence of all random numbers in all processors. I am probably going to
>> look into doing a proper parallel implementation of a rng for 11 that
>> will
>> prevent this; in the meantime there are big issues with being careful
>> about
>> how you seed the rng for ntt 3 anyway. A paper will be coming out on
>> this I
>> believe. I am, as usual, really really surprised how no one has noticed
>> that I have been saying things about this for about two, going on three
>> years (all this is not a nice feature of BG/L, but you have so many
>> others
>> you can enjoy!)
>> Regards - Bob
>>
>> ----- Original Message ----- From: "Carlos Simmerling"
>> <carlos.simmerling.gmail.com>
>>
>> To: <amber-developers.scripps.edu>
>> Sent: Tuesday, May 06, 2008 5:27 PM
>>
>> Subject: amber-developers: Verlet update time and ntt=3 parallel scaling
>>
>>
>>
>>
>> > Hi all,
>> > I've noticed that while running explicit water sander jobs on the Blue
>> > Gene with ntt=3 that the
>> > time is nearly dominated by Verlet update time, which tends to be >50%
>> > of the total
>> > time and most important is nearly completely independent of the number
>> > of
>> CPUs.
>> > In contrast, the ntt=1 Verlet time is small (1%), meaning that my
>> > simulations scale
>> > much much better. Has anyone else noticed this?
>> >
>> > From looking at the code my guess is the runmd.f gauss calls that are
>> > done for all atoms
>> > regardless of parallelism (to keep the random # generators in sync).
>> Preliminary
>> > testing on 16 nodes on my cluster seems to confirm this, though Verlet
>> takes
>> > a much smaller fraction since it's not nearly as many cpus so the
>> > nonbonds dominate.
>> > When I disable the extra gauss calls the Verlet time drops back to
>> > what it is for ntt=1.
>> > I can't test that easily on the blue gene since wait times are days
>> > and there is no
>> > debug queue.
>> >
>> > Has anyone else noticed similar problems at high processor counts for
>> > sander and ntt=3,
>> > or is this another nice feature of the BG?
>> > Carlos
>> >
>> > Verlet is the dominating factor in overall sander scaling for ntt=3 at
>> > 256
>> CPUs:
>> >
>> >
>> > | Other 0.73 ( 8.40% of Recip)
>> > | Recip Ewald time 8.68 (49.97% of Ewald)
>> > | Force Adjust 1.13 ( 6.49% of Ewald)
>> > | Virial junk 1.34 ( 7.71% of Ewald)
>> > | Start sycnronization 0.00 ( 0.01% of Ewald)
>> > | Other 0.02 ( 0.09% of Ewald)
>> > | Ewald time 17.36 (90.19% of Nonbo)
>> > | IPS excludes 0.00 ( 0.01% of Nonbo)
>> > | Other 0.01 ( 0.05% of Nonbo)
>> > | Nonbond force 19.25 (76.12% of Force)
>> > | Bond/Angle/Dihedral 0.13 ( 0.52% of Force)
>> > | FRC Collect time 4.92 (19.45% of Force)
>> > | Other 0.99 ( 3.92% of Force)
>> > | Force time 25.29 (40.39% of Runmd)
>> > | Shake time 0.33 ( 0.53% of Runmd)
>> > | Verlet update time 33.53 (53.55% of Runmd)
>> > | CRD distribute time 3.45 ( 5.50% of Runmd)
>> > | Other 0.02 ( 0.02% of Runmd)
>> > | Runmd Time 62.61 (95.78% of Total)
>> > | Other 2.76 ( 4.22% of Total)
>> > | Total time 65.37 (100.0% of ALL )
>> >
>> >
>>
>>
>
>
>
> --
> ===================================================================
> Carlos L. Simmerling, Ph.D.
> Associate Professor Phone: (631) 632-1336
> Center for Structural Biology Fax: (631) 632-1555
> CMM Bldg, Room G80
> Stony Brook University E-mail: carlos.simmerling.gmail.com
> Stony Brook, NY 11794-5115 Web: http://comp.chem.sunysb.edu
> ===================================================================
>
Received on Wed May 07 2008 - 06:07:50 PDT
Custom Search