Re: [AMBER-Developers] Consistent RNGs across serial and parallel from Jason Swails on 2011-10-16 (Amber Developers Archive Oct 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Mon, 17 Oct 2011 01:56:58 -0400

On Sun, Oct 16, 2011 at 10:01 PM, <dcerutti.rci.rutgers.edu> wrote:

> Hi Devs,
>
> I've got a whole lot of mdgx working in parallel now--hoorah! I'm getting
> about 5x speedup on 8p Opteron, but that's without a load balancer so it
> can only go up. Basic serial speed is midway between sander and pmemd;
> should be ready for the release of AmberTools 2.0.
>
> Regarding random numbers--I know Bob did a non-trivial literature search
> to verify the sanity of the multi-threaded approach, assigning different
> seeds to each MPI thread and letting it go from there.

>From what I can tell in pmemd, each thread is assigned the exact same seed
in the beginning of the simulation, and every processor draws from that same
stream. I only just recently added the ability for a single thread in pmemd
to contain multiple random generators simultaneously; prior to that each
thread had only a single generator (that was really only used for Langevin
dynamics, IIRC).

The way that pmemd ensured reproducibility between serial and parallel
versions was by launching every single thread with exactly the same random
seed, and making sure that every time one thread generated a random number,
all threads did, so exactly the same random numbers were used in the same
place on every thread whether the simulation was done in serial or
parallel. This obviously hurts performance in the case when not all random
numbers are needed on all threads in parallel. For this reason, Langevin
dynamics scaled poorly compared to NVE or Berendsen, since large numbers of
unused random numbers were generated for atoms not controlled by that
thread.

The response to this was, at first, a preprocessor directive encapsulating
the filler random # generations so they could be removed (sacrificing
thread-count-independent reproducibility of results). Now, pmemd employs
the convention that if ig=-1 (random generator is seeded from the system
clock) causes the behavior that unused random numbers are not generated at
all (since that approach to seeding a RNG is inherently not reproducible
anyway). (Ross's contribution)

Now that I'm at
> this point in the coding, though, I see a lot of benefit in being able to
> get exactly the same results from a parallel run as from a
> single-processor run, mostly in terms of debugging future releases. Even
> with valgrind properly configured, there are a lot of phantom MPI errors
> that can be hard to sort out, and I like code to be valgrind clean to
> ensure that some silent bug doesn't emerge later on. I'm thinking that
> the next best test would be the ability to run the code in serial and
> verify a valgrind-clean result, then run it in parallel and obtain the
> same numerical results to infer a clean parallel execution.
>

Agreed. But I think the approach taken in pmemd is a good one -- that is
reproducibility is present, but optional if that's not a requirement (and
the change between the two is simple enough to ensure that that's the only
change you're really making).

>
> So, my idea: if we really wanted reproducibility across different parallel
> setups, given at least the same processor type, it might work to couple
> individual random number generators to each quantum of the work
> decomposition. With my sort of spatial domain decomposition I would
> attach one random number generator to each cell of the simulation box and
> track its state as part of the cell's meta-data. Whatever process
> controls the cell would roll the generator forward as needed to apply
> random numbers to atoms in the cell; system-wide random variables could be
> tied to the RNG of the first cell in the box. Even though cell
> populations of atoms would change throughout the simulation, the
> trajectory itself, and hence the order of random numbers that each RNG
> churns out, would be fixed from the time when the RNGs of all cells are
> seeded. This strategy would not only obviate the need to calculate filler
> random numbers,

But you can add something in like pmemd has so that the generation of filler
RNs can be toggled (for reproducibility or performance), and this should be
a simple enough change that you can be assured of the non-reproducible
version's results...

> but also (if the RNG had a known mechanism for advancing N
> turns) eliminate the need to calculate the N-turn advance. One number,
> one turn of an RNG. Some domain decompositions are very sophisticated,
> though, and I'm not sure if pmemd is dynamically adjusting the sizes of
> neighboring cells to do load-balancing. If that's the case, an atom might
> be the quantum of work.
>

So you would have a separate RNG state for each atom? I don't know what
generator you're using, but the sander/pmemd implementation of their RNG
requires 100 doubles and 2 ints to store the state -- that seems like a huge
cost in memory for arguably little return...

>
> Another issue that comes to mind is whether we, like GROMACS and DESMOND
> and NAMD, are going to start passing the RNG state as part of the restart
> file. Can anyone point to some rules about the format of an ascii restart
> file that might allow the inclusion of additional information yet maintain
> backwards compatibility?
>

I think that this has just become a more reasonable option to explore thanks
to Dan. I would argue against ever putting this kind of information in an
ASCII restart. For starters, ASCII restarts use %12.7F format, which
doesn't even come close to representing full double precision, and so you
won't get a 'true' restart, anyway. Second, there's no "good" way of
including the random number stream in the ASCII format. It already has a
list of the 3N coordinates followed by (optionally!) the 3N velocities,
followed by the box info (if present). Thus, it would be pretty easy to
confuse the values necessary to restore the random number state (100 doubles
and 2 ints for the pmemd/sander implementation of Marsaglia) with velocities
(especially since 102/3 == 34...) While this could potentially be worked
around, it's much much easier to add variables and such seamlessly to NetCDF
files than it is our ASCII format, so I can see this happening as the NetCDF
restart file format matures.

Comments welcome...
>

How about too many comments?

All the best,
Jason

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Sun Oct 16 2011 - 23:00:04 PDT