Re: amber-developers: Verlet update time and ntt=3 parallel scaling from Robert Duke on 2008-05-06 (Amber Developers Archive May 2008)

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 6 May 2008 17:42:12 -0400

This is a very well known issue with ntt 3, caused by the nonscaling nature
of the random number generator in use. To run separate rng's in each
process would potentially be theoretically unsound, so we run through the
sequence of all random numbers in all processors. I am probably going to
look into doing a proper parallel implementation of a rng for 11 that will
prevent this; in the meantime there are big issues with being careful about
how you seed the rng for ntt 3 anyway. A paper will be coming out on this I
believe. I am, as usual, really really surprised how no one has noticed
that I have been saying things about this for about two, going on three
years (all this is not a nice feature of BG/L, but you have so many others
you can enjoy!)
Regards - Bob

----- Original Message -----
From: "Carlos Simmerling" <carlos.simmerling.gmail.com>
To: <amber-developers.scripps.edu>
Sent: Tuesday, May 06, 2008 5:27 PM
Subject: amber-developers: Verlet update time and ntt=3 parallel scaling

> Hi all,
> I've noticed that while running explicit water sander jobs on the Blue
> Gene with ntt=3 that the
> time is nearly dominated by Verlet update time, which tends to be >50%
> of the total
> time and most important is nearly completely independent of the number of
> CPUs.
> In contrast, the ntt=1 Verlet time is small (1%), meaning that my
> simulations scale
> much much better. Has anyone else noticed this?
>
> From looking at the code my guess is the runmd.f gauss calls that are
> done for all atoms
> regardless of parallelism (to keep the random # generators in sync).
> Preliminary
> testing on 16 nodes on my cluster seems to confirm this, though Verlet
> takes
> a much smaller fraction since it's not nearly as many cpus so the
> nonbonds dominate.
> When I disable the extra gauss calls the Verlet time drops back to
> what it is for ntt=1.
> I can't test that easily on the blue gene since wait times are days
> and there is no
> debug queue.
>
> Has anyone else noticed similar problems at high processor counts for
> sander and ntt=3,
> or is this another nice feature of the BG?
> Carlos
>
> Verlet is the dominating factor in overall sander scaling for ntt=3 at 256
> CPUs:
>
>
> | Other 0.73 ( 8.40% of Recip)
> | Recip Ewald time 8.68 (49.97% of Ewald)
> | Force Adjust 1.13 ( 6.49% of Ewald)
> | Virial junk 1.34 ( 7.71% of Ewald)
> | Start sycnronization 0.00 ( 0.01% of Ewald)
> | Other 0.02 ( 0.09% of Ewald)
> | Ewald time 17.36 (90.19% of Nonbo)
> | IPS excludes 0.00 ( 0.01% of Nonbo)
> | Other 0.01 ( 0.05% of Nonbo)
> | Nonbond force 19.25 (76.12% of Force)
> | Bond/Angle/Dihedral 0.13 ( 0.52% of Force)
> | FRC Collect time 4.92 (19.45% of Force)
> | Other 0.99 ( 3.92% of Force)
> | Force time 25.29 (40.39% of Runmd)
> | Shake time 0.33 ( 0.53% of Runmd)
> | Verlet update time 33.53 (53.55% of Runmd)
> | CRD distribute time 3.45 ( 5.50% of Runmd)
> | Other 0.02 ( 0.02% of Runmd)
> | Runmd Time 62.61 (95.78% of Total)
> | Other 2.76 ( 4.22% of Total)
> | Total time 65.37 (100.0% of ALL )
>
Received on Wed May 07 2008 - 06:07:49 PDT