Re: [AMBER-Developers] Re: [AMBER] NTT=3 or NTT=1 from Robert Duke on 2010-05-12 (Amber Developers Archive May 2010)

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 12 May 2010 12:47:04 -0400

Yes, I agree with Ross, basically. I have a stack of RN papers about a foot
high, and lets just say that there have been lots of instances where folks
have been cavalier about RNG, only later to discover that some of their
assumptions were invalid. So that was why I was leaning toward picking up
something widely adopted by the physics MC community, figuring if anybody
understands parallel RNG, they should. I do believe that the RNG we
currently use is good, and it is also reputed to generate uncorrelated runs
when seeded differently, so this should work, but is not proven to work. As
to test though, it really is my other big issue - that, and the noise that
is going to be created by folks getting different simulation numbers for
different processor count, right at step 1, when this stuff is used. I
understand that you test the parallel code with this feature off, and that
gives you some level of confidence, but there is still going to be a lot of
noise from the user community, and you won't know whether there is a bug, a
bad build, they had this turned on and didn't know it, etc. etc. Also, I
don't think folks probably fully appreciate how easy it would be to have a
bug in the parallel workload distribution code that would not be obvious;
earlier I had some fft balancing routines that only were invoked after
several thousand steps under certain pathological conditions; I
intentionally ripped this stuff out when the block fft code was done, not
because it was not potentially useful, but because if there were bugs they
could be subtle and very hard to detect (so all bets are off on detecting
problems after about 400 steps). I completely understand the desire to fix
this parallel RNG performance problem; I just wish there was a better way
that doesn't involve additional validity assumptions, and doesn't create the
potential for a lot of grief with/from/for the user base.
Regards - Bob
----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: "'AMBER Developers Mailing List'" <amber-developers.ambermd.org>
Sent: Wednesday, May 12, 2010 12:23 PM
Subject: RE: [AMBER-Developers] Re: [AMBER] NTT=3 or NTT=1

> Hi Jason,
>
>> This is probably a rather naive approach, but what's wrong with running
>> the
>> tests without the switch, then trigger it for production runs after you
>> know
>> everything else works. Production runs are looking for reproducibility
>> of
>> ensemble properties rather than making sure the first 100 steps are
>> numerically reproducible, anyway, so I don't really see the conflict...
>> (obviously the switch will have to be off to validate changes, but
>> that's
>> easy enough to do)
>
> This is exactly what I planned to do and why it would be a ctrl namelist
> option as a 'tuning' parameter. In fact I planned to just enable it if you
> set ig=-1. The part that everyone is missing here though is that this,
> dealing with if statements, putting it in a namelist, having the test
> cases
> run with the synchronization etc is the EASY bit.
>
> The part that NEEDS to be done first is exactly what you state above. That
> production runs are looking for reproducibility of ensemble properties.
> This
> should be tested BEFORE this option is made available. Hence why it is an
> undocumented ifdef right now. It is no good just saying the random number
> generator works blah blah blah. Someone should actually test if using a
> bunch of random number streams with ig=x, x+1, x+2 to x+nthreads-1 works
> correctly and gives equivalent ensemble properties. If it these tests all
> work then we can enable this as a real documented option.
>
> This is why I say "caveat emptor". This approach has had NO testing except
> for performance.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed May 12 2010 - 10:00:08 PDT