Re: [AMBER-Developers] Re: [AMBER] NTT=3 or NTT=1 from Robert Duke on 2010-05-12 (Amber Developers Archive May 2010)

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 12 May 2010 14:54:38 -0400

Well, without thinking about it, if you can carry the Marsaglia random
numbers through the exact same operations, and reproduce the previous
results, then you should be good...
- Bob
----- Original Message -----
From: "Tom Joseph" <ttjoseph.gmail.com>
To: "AMBER Developers Mailing List" <amber-developers.ambermd.org>
Sent: Wednesday, May 12, 2010 2:37 PM
Subject: Re: [AMBER-Developers] Re: [AMBER] NTT=3 or NTT=1

I just switched out the use of gauss() for the MKL function
vdrnggaussian() in PMEMD, using the Mersenne Twister 19937 base RNG
and the Box-Muller transform. So in theory this should work (assuming
MT19937 is as "good" as the Marsaglia RNG already implemented). For my
75k atom testcase on 64 Xeon cores the speed improves from 5.9 ns/day
to 7 ns/day. The temperature seems "OK". But - I don't know if I did
things correctly (definitely an RNG novice here), and I haven't done
any further validation.

To take advantage of vectorization, a bunch of random numbers are
generated at once and returned as requested until the pile is
exhausted. This happens at the same time on all nodes as it is still
synced. At this time more numbers are generated. gauss() is often
invoked with a requested standard deviation related to the atom mass.
So to avoid having to keep a different bucket of numbers for each
standard deviation, all the numbers are generated with standard
deviation 1.0 and multiplied by the requested standard deviation as
they are returned.

I wonder if this additional multiplication is OK. Is there some other
subtlety I've missed? Does this seem like a reasonable approach?

Thanks,
--Tom

2010/5/12 Robert Duke <rduke.email.unc.edu>:
> Yes, I agree with Ross, basically. I have a stack of RN papers about a
> foot
> high, and lets just say that there have been lots of instances where folks
> have been cavalier about RNG, only later to discover that some of their
> assumptions were invalid. So that was why I was leaning toward picking up
> something widely adopted by the physics MC community, figuring if anybody
> understands parallel RNG, they should. I do believe that the RNG we
> currently use is good, and it is also reputed to generate uncorrelated
> runs
> when seeded differently, so this should work, but is not proven to work.
> As
> to test though, it really is my other big issue - that, and the noise that
> is going to be created by folks getting different simulation numbers for
> different processor count, right at step 1, when this stuff is used. I
> understand that you test the parallel code with this feature off, and that
> gives you some level of confidence, but there is still going to be a lot
> of
> noise from the user community, and you won't know whether there is a bug,
> a
> bad build, they had this turned on and didn't know it, etc. etc. Also, I
> don't think folks probably fully appreciate how easy it would be to have a
> bug in the parallel workload distribution code that would not be obvious;
> earlier I had some fft balancing routines that only were invoked after
> several thousand steps under certain pathological conditions; I
> intentionally ripped this stuff out when the block fft code was done, not
> because it was not potentially useful, but because if there were bugs they
> could be subtle and very hard to detect (so all bets are off on detecting
> problems after about 400 steps). I completely understand the desire to fix
> this parallel RNG performance problem; I just wish there was a better way
> that doesn't involve additional validity assumptions, and doesn't create
> the
> potential for a lot of grief with/from/for the user base.
> Regards - Bob
> ----- Original Message ----- From: "Ross Walker" <ross.rosswalker.co.uk>
> To: "'AMBER Developers Mailing List'" <amber-developers.ambermd.org>
> Sent: Wednesday, May 12, 2010 12:23 PM
> Subject: RE: [AMBER-Developers] Re: [AMBER] NTT=3 or NTT=1
>
>
>> Hi Jason,
>>
>>> This is probably a rather naive approach, but what's wrong with running
>>> the
>>> tests without the switch, then trigger it for production runs after you
>>> know
>>> everything else works. Production runs are looking for reproducibility
>>> of
>>> ensemble properties rather than making sure the first 100 steps are
>>> numerically reproducible, anyway, so I don't really see the conflict...
>>> (obviously the switch will have to be off to validate changes, but
>>> that's
>>> easy enough to do)
>>
>> This is exactly what I planned to do and why it would be a ctrl namelist
>> option as a 'tuning' parameter. In fact I planned to just enable it if
>> you
>> set ig=-1. The part that everyone is missing here though is that this,
>> dealing with if statements, putting it in a namelist, having the test
>> cases
>> run with the synchronization etc is the EASY bit.
>>
>> The part that NEEDS to be done first is exactly what you state above.
>> That
>> production runs are looking for reproducibility of ensemble properties.
>> This
>> should be tested BEFORE this option is made available. Hence why it is an
>> undocumented ifdef right now. It is no good just saying the random number
>> generator works blah blah blah. Someone should actually test if using a
>> bunch of random number streams with ig=x, x+1, x+2 to x+nthreads-1 works
>> correctly and gives equivalent ensemble properties. If it these tests all
>> work then we can enable this as a real documented option.
>>
>> This is why I say "caveat emptor". This approach has had NO testing
>> except
>> for performance.
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> | Assistant Research Professor |
>> | San Diego Supercomputer Center |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>> not
>> be read every day, and should not be used for urgent or sensitive issues.
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed May 12 2010 - 12:00:07 PDT