Re: [AMBER-Developers] Re: [AMBER] NTT=3 or NTT=1 from Jason Swails on 2010-05-12 (Amber Developers Archive May 2010)

From: Jason Swails <jason.swails.gmail.com>
Date: Wed, 12 May 2010 12:12:50 -0400

On Wed, May 12, 2010 at 11:37 AM, Robert Duke <rduke.email.unc.edu> wrote:

> I was not a fan of doing this due to the irreproducibility - run different
> #'s of cpu's and you get different random # sequences (and the problem is
> even worse than that due to loadbalancing on pmemd - no hope for being able
> to use different generated sequences on different processors reproducibly).
> Adrian Roitberg is the champian of the particular Marsaglia PRNG we use; it
> apparently is good, does not require prime seeding to get good
> characteristics in the RN sequence, etc., and can be seeded by any series of
> different numbers. I did a bunch of reading on good parallel PRNG's, and
> all I remember right now is that a "best solution" is not clearcut, and I
> was not sure what the open source issues with some of the stuff were. I
> really really wanted a solution that would prevent the parallel bottleneck
> but also be reproducible to make parallel test possible. This problem
> apparently accounts for a huge amount of computation in the physics
> community - MC simulations. The only reason I did not work on it more was
> work priorities locally - I think it is important. The best solution would
> entail the ability to generate the same sequence deterministically on all
> nodes, assigning to atoms from the sequence in a deterministic fashion, but
> without the overhead of having to generate parts of the sequence you don't
> use). Ultimately, I am expecting RNG chips to become widely available, but
> that is not going to allow for reproducibility either. I am very concerned
> about test for this sort of code.
>

This is probably a rather naive approach, but what's wrong with running the
tests without the switch, then trigger it for production runs after you know
everything else works. Production runs are looking for reproducibility of
ensemble properties rather than making sure the first 100 steps are
numerically reproducible, anyway, so I don't really see the conflict...
(obviously the switch will have to be off to validate changes, but that's
easy enough to do)

All the best,
Jason

- Bob
> ----- Original Message ----- From: "Ross Walker" <ross.rosswalker.co.uk>
>
> To: "'AMBER Mailing List'" <amber.ambermd.org>
> Sent: Wednesday, May 12, 2010 12:19 AM
> Subject: RE: [AMBER] NTT=3 or NTT=1
>
>
>
> This occurs in a loop over all atoms. If the atom belongs to a
>>> specific
>>> processor, then that processor needs to shell out 3 random numbers,
>>> presumably to provide a random hit in each cartesian direction (though
>>> I may
>>> be misunderstanding something here). If it does not own that atom, it
>>> has
>>> to simply create 3 unused random numbers for the sake of remaining in-
>>> sync
>>> with the rest of the threads. This seems like it can add up to a LOT
>>> of
>>> extra calls to gauss for some threads in a highly-multithreaded
>>> situation.
>>>
>>
>> Exactly. It is a serial block of code. ALL threads execute the loop natom
>> times meaning the work is constant regardless of the number of threads
>> used.
>> Hence if this loop takes 0.1% of the total time in serial this means when
>> you run on 2 threads it takes 0.2% of the total time (assuming everything
>> else scales perfectly). By the time you get to 256 threads then it is
>> taking
>>
>>> 25% of the total time. This is of course very bad since it means the
>>>
>> scaling gets worse as you go to higher processor counts. Avoiding this
>> synchronization by only doing the call over atoms that are owned by this
>> processor so the time spent calling gauss decreases as a function of the
>> number of threads. This is absolutely required for any program to scale
>> well
>> in parallel.
>>
>> The downside is you lose the reproducibility which is why it was coded
>> this
>> way in the first place. It also assumes that the random number streams,
>> since each thread now gets ig+thread_count as the random seed are truly
>> independent. That is for our random number generator is:
>>
>> x[ig](i=1 to natom) is completely uncorrelated with x[ig+y](i=1 to natom)
>> for all values of y where ig is the seed used on the master thread?
>>
>> I have NO idea if this is true hence why the 'feature' is undocumented. If
>> it is true then all is good. If someone can prove this analytically for
>> the
>> random number generator we use then all is good.
>>
>> If someone wants to replace our random number generator with a real
>> parallel
>> random number generator then even better!
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> | Assistant Research Professor |
>> | San Diego Supercomputer Center |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
>> be read every day, and should not be used for urgent or sensitive issues.
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Wed May 12 2010 - 09:30:07 PDT