This is a very well known issue with ntt 3, caused by the nonscaling nature 
of the random number generator in use.  To run separate rng's in each 
process would potentially be theoretically unsound, so we run through the 
sequence of all random numbers in all processors.  I am probably going to 
look into doing a proper parallel implementation of a rng for 11 that will 
prevent this; in the meantime there are big issues with being careful about 
how you seed the rng for ntt 3 anyway.  A paper will be coming out on this I 
believe.  I am, as usual, really really surprised how no one has noticed 
that I have been saying things about this for about two, going on three 
years (all this is not a nice feature of BG/L, but you have so many others 
you can enjoy!)
Regards - Bob
----- Original Message ----- 
From: "Carlos Simmerling" <carlos.simmerling.gmail.com>
To: <amber-developers.scripps.edu>
Sent: Tuesday, May 06, 2008 5:27 PM
Subject: amber-developers: Verlet update time and ntt=3 parallel scaling
> Hi all,
> I've noticed that while running explicit water sander jobs on the Blue
> Gene with ntt=3 that the
> time is nearly dominated by Verlet update time, which tends to be >50%
> of the total
> time and most important is nearly completely independent of the number of 
> CPUs.
> In contrast, the ntt=1 Verlet time is small (1%), meaning that my
> simulations scale
> much much better. Has anyone else noticed this?
>
> From looking at the code my guess is the runmd.f gauss calls that are
> done for all atoms
> regardless of parallelism (to keep the random # generators in sync). 
> Preliminary
> testing on 16 nodes on my cluster seems to confirm this, though Verlet 
> takes
> a much smaller fraction since it's not nearly as many cpus so the
> nonbonds dominate.
> When I disable the extra gauss calls the Verlet time drops back to
> what it is for ntt=1.
> I can't test that easily on the blue gene since wait times are days
> and there is no
> debug queue.
>
> Has anyone else noticed similar problems at high processor counts for
> sander and ntt=3,
> or is this another nice feature of the BG?
> Carlos
>
> Verlet is the dominating factor in overall sander scaling for ntt=3 at 256 
> CPUs:
>
>
> |                   Other                      0.73 ( 8.40% of Recip)
> |                Recip Ewald time           8.68 (49.97% of Ewald)
> |                Force Adjust               1.13 ( 6.49% of Ewald)
> |                Virial junk                1.34 ( 7.71% of Ewald)
> |                Start sycnronization       0.00 ( 0.01% of Ewald)
> |                Other                      0.02 ( 0.09% of Ewald)
> |             Ewald time                17.36 (90.19% of Nonbo)
> |             IPS excludes               0.00 ( 0.01% of Nonbo)
> |             Other                      0.01 ( 0.05% of Nonbo)
> |          Nonbond force             19.25 (76.12% of Force)
> |          Bond/Angle/Dihedral        0.13 ( 0.52% of Force)
> |          FRC Collect time           4.92 (19.45% of Force)
> |          Other                      0.99 ( 3.92% of Force)
> |       Force time                25.29 (40.39% of Runmd)
> |       Shake time                 0.33 ( 0.53% of Runmd)
> |       Verlet update time        33.53 (53.55% of Runmd)
> |       CRD distribute time        3.45 ( 5.50% of Runmd)
> |       Other                      0.02 ( 0.02% of Runmd)
> |    Runmd Time                62.61 (95.78% of Total)
> |    Other                      2.76 ( 4.22% of Total)
> | Total time                65.37 (100.0% of ALL  )
> 
Received on Wed May 07 2008 - 06:07:49 PDT