Re: amber-developers: Verlet update time and ntt=3 parallel scaling from Andreas Svrcek-Seiler on 2008-05-07 (Amber Developers Archive May 2008)

From: Andreas Svrcek-Seiler <svrci.tbi.univie.ac.at>
Date: Wed, 7 May 2008 21:35:32 +0200 (CEST)

Just 0.02$ more:
In an ideal rng there are no correlations, but such a thing does not
exist. The statistical properties
of any subsequences should be indiscernible from physical white noise.
Currently the best rng (as far as I know) is the Mersenne twister (MT)
(see http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html).
It offers periods up to 2**216091-1 (if one needs it) and
as far as I remember any chosen seed gets you to somewhere (randomly,
equally distributed) onto a number stream of length (period).
Besides even the not architecture-optimized version of the MT
is twice or more faster than (in)famous rand(), which I tested a while
ago.
I someone is really interested, he/she could try to contact MT's
"father", Makoto Matsumoto, for answers to specifiec technical questions.
He seem quite enthusiastic about people actually making use of his work.

Unfortunately I didn't understand a word from the paper about MT.

best,
Andreas

P.S.:It seems hard to imagine (for me) that there's a significant
difference between a "good" and a "bad" rng when it comes to MD (e.g. driving a
langevin-heatbath). Are you all sure you could tell the difference
between a "real" rng and -say- the first 1000000 digits of pi
repeated over and over driving a MD run when looking at the results?
(nonetheless - in dubio mersenne twisto :-)

             )))))
             (((((
            ( O O )
-------oOOO--(_)--OOOo-----------------------------------------------------
               o Wolfgang Andreas Svrcek-Seiler
               o (godzilla)
                        svrci.tbi.univie.ac.at
       .oooO Tel.:01-4277-52747
       ( ) Oooo.
-------\ (----( )--------------------------------------------------------
         \_) ) /
               (_/

On Wed, 7 May 2008, Robert Duke wrote:

> Well, at least in my reading so far on the subject, this is not the case.
> There ARE correlations in any pseudo random number generator, and you are not
> guaranteed of getting a proper distribution unless you have deeper knowledge
> of the rng algorithm itself, and essentially start with one seed and from
> there select "leapfrog" points in the deterministic sequence of the rng.
> That is at least what I have read so far, but I have not read widely; this
> stuff is in the physics literature, and I am preoccupied at the moment trying
> to move other mountains, so I am not all over it just yet. So given recent
> surprises with rng issues, I feel fairly vindicated in not jumping on the
> "simple" solution here and just generating a bunch of random seeds in
> parallel. As I read more, I will know more, but I think what you propose
> here is potentially a really really bad idea, and I would recommend against
> just trying it because a lot of bad work can be done before the sensitive
> test case turns up that starts making you wonder. The rng we use is supposed
> to be pretty good, but I need to read more about it (Marsaglia's), and put it
> in a proper context relative to current work on parallel rng's. I am at the
> point where I will sort of promise to come up with something, given that I am
> still employed, but I won't put anything out unless I am absolutely certain
> that it generates a good sequence of random numbers (and I won't go any
> further than saying I will get rng enhancements into pmemd for 11; hopefully
> I will actually have some usable patches before that, but right now I am
> preoccupied).
> Regards - Bob
>
> ----- Original Message ----- From: "Ross Walker" <ross.rosswalker.co.uk>
> To: <amber-developers.scripps.edu>
> Sent: Wednesday, May 07, 2008 1:53 PM
> Subject: RE: amber-developers: Verlet update time and ntt=3 parallel scaling
>
>
>> As far as I can tell if our random number generator is any good - which I
>> don't know if we have properly checked or not - two sets of random numbers
>> from different seeds should not have any correlation. Thus it should be
>> equally correct (statistically) to do a Langevin run with each processor
>> having its own random number stream - with simply different seeds for each
>> mpi thread. This should be equivalent to having a single random number
>> stream shared between all processors where each processor makes sure it
>> doesn't use the same portion of the stream as other processors.
>>
>> Of course the first option makes testing in parallel difficult but then we
>> only get about 300 steps or so matching anyway.
>>
>> So perhaps we should have two modes of operation (controlled in $cntrl
>> maybe).
>>
>> A testing mode in which it does exactly what we have now and a production
>> mode in which each thread uses its own random number stream. The question
>> is
>> how to set ig on each processor. One option would be for the master to use
>> IG from &cntrl and then each mpi task add a successively bigger prime
>> number
>> to IG and use that (ig+3,+5,+7,+11 etc...). Another option would be for
>> each
>> processor to just add its task ID to ig but this may not be safe since it
>> is
>> possible that two random number streams for IG and IG+1 have some
>> correlation - although I think this is purely hearsay and again I don't
>> think it has been checked.
>>
>> These approaches would at least be reproducible on a given number of
>> processors - for sander at least, perhaps not for PMEMD.
>>
>> Comments?
>>
>>
>> /\
>> \/
>> |\oss Walker
>>
>> | Assistant Research Professor |
>> | San Diego Supercomputer Center |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
>> be read every day, and should not be used for urgent or sensitive issues.
>>
>>
>>
>>
>>
>
>
Received on Sun May 11 2008 - 06:07:14 PDT