Re: amber-developers: Verlet update time and ntt=3 parallel scaling from Robert Duke on 2008-05-07 (Amber Developers Archive May 2008)

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 7 May 2008 14:21:45 -0400

Well, at least in my reading so far on the subject, this is not the case.
There ARE correlations in any pseudo random number generator, and you are
not guaranteed of getting a proper distribution unless you have deeper
knowledge of the rng algorithm itself, and essentially start with one seed
and from there select "leapfrog" points in the deterministic sequence of the
rng. That is at least what I have read so far, but I have not read widely;
this stuff is in the physics literature, and I am preoccupied at the moment
trying to move other mountains, so I am not all over it just yet. So given
recent surprises with rng issues, I feel fairly vindicated in not jumping on
the "simple" solution here and just generating a bunch of random seeds in
parallel. As I read more, I will know more, but I think what you propose
here is potentially a really really bad idea, and I would recommend against
just trying it because a lot of bad work can be done before the sensitive
test case turns up that starts making you wonder. The rng we use is
supposed to be pretty good, but I need to read more about it (Marsaglia's),
and put it in a proper context relative to current work on parallel rng's.
I am at the point where I will sort of promise to come up with something,
given that I am still employed, but I won't put anything out unless I am
absolutely certain that it generates a good sequence of random numbers (and
I won't go any further than saying I will get rng enhancements into pmemd
for 11; hopefully I will actually have some usable patches before that, but
right now I am preoccupied).
Regards - Bob

----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: <amber-developers.scripps.edu>
Sent: Wednesday, May 07, 2008 1:53 PM
Subject: RE: amber-developers: Verlet update time and ntt=3 parallel scaling

> As far as I can tell if our random number generator is any good - which I
> don't know if we have properly checked or not - two sets of random numbers
> from different seeds should not have any correlation. Thus it should be
> equally correct (statistically) to do a Langevin run with each processor
> having its own random number stream - with simply different seeds for each
> mpi thread. This should be equivalent to having a single random number
> stream shared between all processors where each processor makes sure it
> doesn't use the same portion of the stream as other processors.
>
> Of course the first option makes testing in parallel difficult but then we
> only get about 300 steps or so matching anyway.
>
> So perhaps we should have two modes of operation (controlled in $cntrl
> maybe).
>
> A testing mode in which it does exactly what we have now and a production
> mode in which each thread uses its own random number stream. The question
> is
> how to set ig on each processor. One option would be for the master to use
> IG from &cntrl and then each mpi task add a successively bigger prime
> number
> to IG and use that (ig+3,+5,+7,+11 etc...). Another option would be for
> each
> processor to just add its task ID to ig but this may not be safe since it
> is
> possible that two random number streams for IG and IG+1 have some
> correlation - although I think this is purely hearsay and again I don't
> think it has been checked.
>
> These approaches would at least be reproducible on a given number of
> processors - for sander at least, perhaps not for PMEMD.
>
> Comments?
>
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
Received on Sun May 11 2008 - 06:07:14 PDT