Re: amber-developers: Verlet update time and ntt=3 parallel scaling from Scott Brozell on 2008-05-07 (Amber Developers Archive May 2008)

From: Scott Brozell <sbrozell.scripps.edu>
Date: Wed, 7 May 2008 14:51:07 -0700 (PDT)

Hi,

> >> As far as I can tell if our random number generator is any good - which I
> >> don't know if we have properly checked or not - two sets of random numbers
> >> from different seeds should not have any correlation. Thus it should be

In the sense that Ross probably meant there may be no correlations,
but in another sense two such sequences are 100% correlated:
that sense is that the next element in the sequence after any particular
entry is identical for all seeds.
A real world example is a linear congruential rng with specific good
m, a, and c. A toy example is the rng whose sequence is 4 0 1 3 2
where the seed is the cardinal index; the next entry after 0 is 1
for all seeds.

The classic treatise on rng is Knuth vol 2 (3rd ed is 1997)
where the sad (ie, promulgation of non-random generators) history
can be read. I have forgotten most of what i knew from the 2nd ed,
but I strongly advise following a careful course, as Bob had advocated.

Scott

On Wed, 7 May 2008, Andreas Svrcek-Seiler wrote:

> In an ideal rng there are no correlations, but such a thing does not
> exist. The statistical properties
> of any subsequences should be indiscernible from physical white noise.
> Currently the best rng (as far as I know) is the Mersenne twister (MT)
>
> P.S.:It seems hard to imagine (for me) that there's a significant
> difference between a "good" and a "bad" rng when it comes to MD (e.g. driving a
> langevin-heatbath). Are you all sure you could tell the difference
> between a "real" rng and -say- the first 1000000 digits of pi
> repeated over and over driving a MD run when looking at the results?
> (nonetheless - in dubio mersenne twisto :-)
>
> On Wed, 7 May 2008, Robert Duke wrote:
>
> > Well, at least in my reading so far on the subject, this is not the case.
> > There ARE correlations in any pseudo random number generator, and you are not
> > guaranteed of getting a proper distribution unless you have deeper knowledge
> > of the rng algorithm itself, and essentially start with one seed and from
> > there select "leapfrog" points in the deterministic sequence of the rng.
> > That is at least what I have read so far, but I have not read widely; this
> > stuff is in the physics literature, and I am preoccupied at the moment trying
> > to move other mountains, so I am not all over it just yet. So given recent
> > surprises with rng issues, I feel fairly vindicated in not jumping on the
> > "simple" solution here and just generating a bunch of random seeds in
> > parallel. As I read more, I will know more, but I think what you propose
> > here is potentially a really really bad idea, and I would recommend against
> > just trying it because a lot of bad work can be done before the sensitive
> > test case turns up that starts making you wonder. The rng we use is supposed
> > to be pretty good, but I need to read more about it (Marsaglia's), and put it
> > in a proper context relative to current work on parallel rng's. I am at the
> > point where I will sort of promise to come up with something, given that I am
> > still employed, but I won't put anything out unless I am absolutely certain
> > that it generates a good sequence of random numbers (and I won't go any
> > further than saying I will get rng enhancements into pmemd for 11; hopefully
> > I will actually have some usable patches before that, but right now I am
> > preoccupied).
> > Regards - Bob
> >
> > ----- Original Message ----- From: "Ross Walker" <ross.rosswalker.co.uk>
> > To: <amber-developers.scripps.edu>
> > Sent: Wednesday, May 07, 2008 1:53 PM
> > Subject: RE: amber-developers: Verlet update time and ntt=3 parallel scaling
> >
> >
> >> As far as I can tell if our random number generator is any good - which I
> >> don't know if we have properly checked or not - two sets of random numbers
> >> from different seeds should not have any correlation. Thus it should be
> >> equally correct (statistically) to do a Langevin run with each processor
> >> having its own random number stream - with simply different seeds for each
> >> mpi thread. This should be equivalent to having a single random number
> >> stream shared between all processors where each processor makes sure it
> >> doesn't use the same portion of the stream as other processors.
> >>
> >> Of course the first option makes testing in parallel difficult but then we
> >> only get about 300 steps or so matching anyway.
> >>
> >> So perhaps we should have two modes of operation (controlled in $cntrl
> >> maybe).
> >>
> >> A testing mode in which it does exactly what we have now and a production
> >> mode in which each thread uses its own random number stream. The question
> >> is
> >> how to set ig on each processor. One option would be for the master to use
> >> IG from &cntrl and then each mpi task add a successively bigger prime
> >> number
> >> to IG and use that (ig+3,+5,+7,+11 etc...). Another option would be for
> >> each
> >> processor to just add its task ID to ig but this may not be safe since it
> >> is
> >> possible that two random number streams for IG and IG+1 have some
> >> correlation - although I think this is purely hearsay and again I don't
> >> think it has been checked.
> >>
> >> These approaches would at least be reproducible on a given number of
> >> processors - for sander at least, perhaps not for PMEMD.
Received on Sun May 11 2008 - 06:07:15 PDT