Re: amber-developers: Verlet update time and ntt=3 parallel scaling

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Tue, 6 May 2008 18:07:35 -0400

Hi Bob,
thanks for the comments. Yes I knew that this was an issue in principle,
I just didn't realize it became the dominant factor in scaling even
past 64 cpus.
I was wondering if the BG is particularly slow at this calculation and don't
have data points from other machines to know.
Thanks for the reminder to all on the seeding, we're careful with that
especially
with things like replica exchange.
thanks
Carlos

On Tue, May 6, 2008 at 5:42 PM, Robert Duke <rduke.email.unc.edu> wrote:
> This is a very well known issue with ntt 3, caused by the nonscaling nature
> of the random number generator in use. To run separate rng's in each
> process would potentially be theoretically unsound, so we run through the
> sequence of all random numbers in all processors. I am probably going to
> look into doing a proper parallel implementation of a rng for 11 that will
> prevent this; in the meantime there are big issues with being careful about
> how you seed the rng for ntt 3 anyway. A paper will be coming out on this I
> believe. I am, as usual, really really surprised how no one has noticed
> that I have been saying things about this for about two, going on three
> years (all this is not a nice feature of BG/L, but you have so many others
> you can enjoy!)
> Regards - Bob
>
> ----- Original Message ----- From: "Carlos Simmerling"
> <carlos.simmerling.gmail.com>
>
> To: <amber-developers.scripps.edu>
> Sent: Tuesday, May 06, 2008 5:27 PM
>
> Subject: amber-developers: Verlet update time and ntt=3 parallel scaling
>
>
>
>
> > Hi all,
> > I've noticed that while running explicit water sander jobs on the Blue
> > Gene with ntt=3 that the
> > time is nearly dominated by Verlet update time, which tends to be >50%
> > of the total
> > time and most important is nearly completely independent of the number of
> CPUs.
> > In contrast, the ntt=1 Verlet time is small (1%), meaning that my
> > simulations scale
> > much much better. Has anyone else noticed this?
> >
> > From looking at the code my guess is the runmd.f gauss calls that are
> > done for all atoms
> > regardless of parallelism (to keep the random # generators in sync).
> Preliminary
> > testing on 16 nodes on my cluster seems to confirm this, though Verlet
> takes
> > a much smaller fraction since it's not nearly as many cpus so the
> > nonbonds dominate.
> > When I disable the extra gauss calls the Verlet time drops back to
> > what it is for ntt=1.
> > I can't test that easily on the blue gene since wait times are days
> > and there is no
> > debug queue.
> >
> > Has anyone else noticed similar problems at high processor counts for
> > sander and ntt=3,
> > or is this another nice feature of the BG?
> > Carlos
> >
> > Verlet is the dominating factor in overall sander scaling for ntt=3 at 256
> CPUs:
> >
> >
> > | Other 0.73 ( 8.40% of Recip)
> > | Recip Ewald time 8.68 (49.97% of Ewald)
> > | Force Adjust 1.13 ( 6.49% of Ewald)
> > | Virial junk 1.34 ( 7.71% of Ewald)
> > | Start sycnronization 0.00 ( 0.01% of Ewald)
> > | Other 0.02 ( 0.09% of Ewald)
> > | Ewald time 17.36 (90.19% of Nonbo)
> > | IPS excludes 0.00 ( 0.01% of Nonbo)
> > | Other 0.01 ( 0.05% of Nonbo)
> > | Nonbond force 19.25 (76.12% of Force)
> > | Bond/Angle/Dihedral 0.13 ( 0.52% of Force)
> > | FRC Collect time 4.92 (19.45% of Force)
> > | Other 0.99 ( 3.92% of Force)
> > | Force time 25.29 (40.39% of Runmd)
> > | Shake time 0.33 ( 0.53% of Runmd)
> > | Verlet update time 33.53 (53.55% of Runmd)
> > | CRD distribute time 3.45 ( 5.50% of Runmd)
> > | Other 0.02 ( 0.02% of Runmd)
> > | Runmd Time 62.61 (95.78% of Total)
> > | Other 2.76 ( 4.22% of Total)
> > | Total time 65.37 (100.0% of ALL )
> >
> >
>
>



-- 
===================================================================
Carlos L. Simmerling, Ph.D.
Associate Professor Phone: (631) 632-1336
Center for Structural Biology Fax: (631) 632-1555
CMM Bldg, Room G80
Stony Brook University E-mail: carlos.simmerling.gmail.com
Stony Brook, NY 11794-5115 Web: http://comp.chem.sunysb.edu
===================================================================
Received on Wed May 07 2008 - 06:07:50 PDT
Custom Search