[AMBER-Developers] Consistent RNGs across serial and parallel

From: <dcerutti.rci.rutgers.edu>
Date: Sun, 16 Oct 2011 22:01:36 -0400 (EDT)

Hi Devs,

I've got a whole lot of mdgx working in parallel now--hoorah! I'm getting
about 5x speedup on 8p Opteron, but that's without a load balancer so it
can only go up. Basic serial speed is midway between sander and pmemd;
should be ready for the release of AmberTools 2.0.

Regarding random numbers--I know Bob did a non-trivial literature search
to verify the sanity of the multi-threaded approach, assigning different
seeds to each MPI thread and letting it go from there. Now that I'm at
this point in the coding, though, I see a lot of benefit in being able to
get exactly the same results from a parallel run as from a
single-processor run, mostly in terms of debugging future releases. Even
with valgrind properly configured, there are a lot of phantom MPI errors
that can be hard to sort out, and I like code to be valgrind clean to
ensure that some silent bug doesn't emerge later on. I'm thinking that
the next best test would be the ability to run the code in serial and
verify a valgrind-clean result, then run it in parallel and obtain the
same numerical results to infer a clean parallel execution.

So, my idea: if we really wanted reproducibility across different parallel
setups, given at least the same processor type, it might work to couple
individual random number generators to each quantum of the work
decomposition. With my sort of spatial domain decomposition I would
attach one random number generator to each cell of the simulation box and
track its state as part of the cell's meta-data. Whatever process
controls the cell would roll the generator forward as needed to apply
random numbers to atoms in the cell; system-wide random variables could be
tied to the RNG of the first cell in the box. Even though cell
populations of atoms would change throughout the simulation, the
trajectory itself, and hence the order of random numbers that each RNG
churns out, would be fixed from the time when the RNGs of all cells are
seeded. This strategy would not only obviate the need to calculate filler
random numbers, but also (if the RNG had a known mechanism for advancing N
turns) eliminate the need to calculate the N-turn advance. One number,
one turn of an RNG. Some domain decompositions are very sophisticated,
though, and I'm not sure if pmemd is dynamically adjusting the sizes of
neighboring cells to do load-balancing. If that's the case, an atom might
be the quantum of work.

Another issue that comes to mind is whether we, like GROMACS and DESMOND
and NAMD, are going to start passing the RNG state as part of the restart
file. Can anyone point to some rules about the format of an ascii restart
file that might allow the inclusion of additional information yet maintain
backwards compatibility?

Comments welcome...


AMBER-Developers mailing list
Received on Sun Oct 16 2011 - 19:30:03 PDT
Custom Search