Re: [AMBER-Developers] Consistent RNGs across serial and parallel

From: Duke, Robert E Jr <rduke.email.unc.edu>
Date: Mon, 17 Oct 2011 16:19:01 +0000

Hi Dave,
Just a quick note.

The other sources of irreproducibility don't alter the results so rapidly that you can't validate the software, so I don't buy that argument (basically, you should be "visibly identical" in the output files for at least 100 or more steps - even up to 500). This allows you enough reproducibility to take simulations through points where you might have questions using a common restart. A potential touchy point though comes with slow modifications to run conditions, like loadbalancing. I did a lot of work on those sorts of issues by setting parameters to cause loadbalancing to occur more rapidly, thus testing all the code paths within 500 steps.

I think saving the state should not be a big problem; obviously you just have to get all of it. I have for a long time considered this the right thing to do.

There is a reasonable amount of work on parallel PRNG's, of varying quality and soundness, no doubt. If memory serves, this is an important issue in the MC physics community. I am still interested in the issues, but did not pursue the work further mostly because my Amber-related funding went away.

Best Regards - Bob



________________________________________
From: dcerutti.rci.rutgers.edu [dcerutti.rci.rutgers.edu]
Sent: Monday, October 17, 2011 12:03 AM
To: AMBER Developers Mailing List
Subject: Re: [AMBER-Developers] Consistent RNGs across serial and parallel

Yes, I was a bit too fast with that RNG idea, but it could still be viable
if the decomposition is right and a suitable RNG is used. I've been using
ran2 in mdgx, which takes 36 integers, and as I said my decomposition
scheme is such that I would need one RNG per cell, which is about 1 per
50-100 atoms. One RNG per atom would be far too much; when the atoms
migrate between processes the RNGs would have to go with them!

Regarding reproducibility, there are still a lot of ways that trajectories
can diverge beyond the RNGs, and this has been another argument for not
worrying about them in pmemd. In mdgx, there are a couple of things that
will cause the trajectories to diverge based on non-associativity in the
arithmetic, but the ones I can identify are amenable to inexpensive
changes that would lead to reproducible results. I don't intend to do
this, but I also like to keep avenues open unless there's clear reason to
close them off.

As for including the state in the restart, I was thinking something like

NATOM Timestamp
X1 Y1 Z1 X2 Y2 Z2
X3 Y3 Z3 ...
...
Xn Yn Zn
Vx1 Vy1 Vz1 Vx2 Vy2 Vz2
Vx3 Vy3 Vz3 ...
...
Vxn Vyn Vzn
Gx Gy Gz Ga Gb Gc
%
% RNG STATE
% S1 S2 S3 S4 S5 S6
% S7 S8 ...
% ...

I'd hope that AMBER programs can tolerate reading to up through the box
dimensions and then stopping, even if there's some extra data beyond that
marked off with special characters. Even if it's a double, you can just
codify it somehow, print it in ascii format and splice it back together.
Cumbersome, but for only 100 doubles it's trivial. I just don't know
enough about how the various other programs that work with restarts do it;
it might also break non-AMBER programs, though, so maybe the ascii format
is stuck. Perhaps the best thing would be a supplementary checkpoint file
(.chk?) to contain this new data, which the program looks for but doesn't
flinch if it's unavailable.

Dave


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Oct 17 2011 - 09:30:05 PDT
Custom Search