Re: [AMBER-Developers] On the precision of SPFP in pmemd.cuda from B. Lachele Foley on 2017-06-05 (Amber Developers Archive Jun 2017)

From: B. Lachele Foley <lfoley.ccrc.uga.edu>
Date: Mon, 5 Jun 2017 09:39:18 +0000

Why did you perturb your coords to <=1e-6 A? Have you tried perturbing them by <=( (1e-6) * xi ) where xi is x, y or z?

:-) Lachele

Dr. B. Lachele Foley
Associate Research Scientist
Complex Carbohydrate Research Center
The University of Georgia
Athens, GA USA
lfoley.uga.edu
http://glycam.org
________________________________
From: David Cerutti <dscerutti.gmail.com>
Sent: Saturday, June 3, 2017 3:20:34 PM
To: AMBER Developers Mailing List
Subject: Re: [AMBER-Developers] On the precision of SPFP in pmemd.cuda

Where is the wiki, so I can post this (there's some follow-up to write,
too)?

As far as the extra memory, __shared__ is wide open now that I've converted
everything to use __shfl() (that's one thread reading another thread's
value, to make the most use of registers). There is about 44kb out of a
maximum 48kb available. There is no danger of having 32-bit integers
overflow anymore--there are no 32-bit integers in the process.

I've tried a number of things to improve the way the sparse tiles get
handled, but aside from one tweak which I feel makes the code a little
easier to parse and has negligible effect on speed, there is really no way
to beat what we've got right now. New technology and software (Volta, CUDA
9) will offer us some features to do what I've not been able to with
cleverness, but it will also require us to be very careful about the
accumulation because it's breaking the warp synchronization that the
current code relies on.

Dave

On Sat, Jun 3, 2017 at 12:45 AM, B. Lachele Foley <lfoley.ccrc.uga.edu>
wrote:

> Was this ever put on the wiki? I'm not sure how much I can contribute,
> but I'm curious about things. For example, what would you do with the
> newly freed shared memory? (and if you said that, sorry but I got lost
> somewhere) Also, these recommendations - are they something I should
> share with labmates as actual recommendations or as possible ones?
>
>
> :-) Lachele
>
> Dr. B. Lachele Foley
> Associate Research Scientist
> Complex Carbohydrate Research Center
> The University of Georgia
> Athens, GA USA
> lfoley.uga.edu
> http://glycam.org
> ________________________________
> From: David Cerutti <dscerutti.gmail.com>
> Sent: Wednesday, May 17, 2017 1:08:31 AM
> To: AMBER Developers Mailing List
> Subject: Re: [AMBER-Developers] On the precision of SPFP in pmemd.cuda
>
> I'd be happy to post this on a wiki. Ideally, we can get a rich wiki
> environment incorporated into the new website, which I still need to put
> the capstone on.
>
> I was considering minimization last night--the code *already* clamps forces
> at 10000 kcal/mol-A for minimization, no matter the mode (it just takes
> place at different stages of the accumulation), so for minimization modes
> in SPFP I've lowered that barrier to 1800 kcal/mol-A (unlikely that two
> very large forces will align so tightly that they could combine to break
> 2048 kcal/mol-A). Also there are notes on the current GPU pages warning
> people not to minimize with SPFP.
>
> The "official" internal coordinate representation of pmemd.cuda is double
> precision, and iwrap is inconsequential. What happens, rather, is that
> when the pair list gets built, the atoms get re-imaged and placed into the
> hash table as floating point numbers. Long story short, each cell in the
> hash table has an origin, and the atom locations are stored, it seems, by
> their Cartesian displacements from that origin. Each cell in the hash
> table seems to be at least the cutoff plus non-bonded buffer zone, so think
> 9A + 2A or 10A + 2A in the JAC case above, and the cells have to span the
> unit cell. What that means is that the floating point numbers have to
> cover a range of [0, cutoff) initially, and (0-buffer, cutoff+buffer) as
> the pairlist gets older, and that means numbers of up to 10-12. Plug those
> into this handy website:
>
> https://www.h-schmidt.net/FloatConverter/IEEE754.html
>
> and one can see that, for numbers between 8 and 16, the mantissa's least
> significant bits correspond to displacements of 1 part in 1048576, or about
> a millionth of an Angstrom. We may be able to rework the cell grid and
> store atom locations relative to the CENTER of the cell, and pick up
> another bit worth of precision in most cases (have the numbers span a range
> from -7.5 to +7.5, for example--the last significant bits will correspond
> to displacements of 1 / 2097152 Angstroms). Also there may be cases,
> perhaps in octahedral unit cells, where the a few of the atoms' cell
> coordinates have to get bigger than 16, and would thus lose a bit of
> precision (literally).
>
> For those applications with aggressive swapping, it might be feasible to
> keep the original form of the kernel in the code and apply that for the
> first few steps after a swap move, to give extremely large forces time to
> dissipate before switching back to the streamlined version.
>
> One way or another, in 2015 D. E. Shaw made their code run 30% faster than
> ours (after accounting for their multiple timesteps and so forth), breaking
> away in what was once a tight race. I think I know how to get that
> performance in pmemd.cuda, and more __shared__ memory is one thing that
> will help.
>
> Dave
>
>
> On Tue, May 16, 2017 at 11:11 PM, Jason Swails <jason.swails.gmail.com>
> wrote:
>
> > Hi Dave,
> >
> > This is a lot of cool information and a nicely complete investigation of
> > this -- it was an enjoyable and informative read. Maybe I could
> encourage
> > you to start up a Wiki page with some of your investigations so your
> > investigative work survives a bit longer past the end of this discussion.
> >
> > On Tue, May 16, 2017 at 10:10 PM, David Cerutti <dscerutti.gmail.com>
> > wrote:
> >
> > > (In SP modes pmemd.cuda
> > > can represent positions to a precision of 1/1048576 A, a number which
> > won't
> > > change unless the cutoff gets needlessly large or unadvisably small.)
> >
> >
> > What does the internal representation of the coordinates look like for a
> > PME simulation in pmemd.cuda when iwrap is set to 0 (i.e., no wrapping is
> > done)? As coordinates grow, this precision you pointed out degrades (the
> > difference between adjacent real numbers that can be represented with
> > fixed-size floating point numbers grows as the absolute values of the
> > numbers themselves grow). Obviously with a nicely packed box (like how
> > Amber simulations usually start or restarting from a simulation using
> > iwrap=1), the precision won't vary much across 64 A. But if coordinates
> > are allowed to grow without bound, I suspect this precision could quickly
> > become the largest source of error.
> >
> > Probably the safest way to run an iwrap=0 simulation is to maintain the
> > "compact" representation of coordinates internally (so as to avoid losing
> > precision when computing energies and/or forces) as well as a set of
> > translations for each atom so that the "unwrapped" representation can be
> > returned for printing in the corresponding output files. I'm pretty sure
> > this is what OpenMM does, but I have no clue about pmemd.cuda.
> >
> > 
> > > representation. In this format I can handle forces up to 2000
> > kcal/mol-A.
> > >
> > > I'd conclude this by saying "I don't think that any simulation that's
> > > stable at all is going to break that," but it sounds too ominous and
> will
> > > no doubt invite comments of "famous last words..." If really necessary,
> > I'm
> > > pretty confident that I can push to 19 bits of precision past the
> decimal
> > > (force truncation is then getting us slightly more error than the
> > > coordinate rounding, but still way way below any of the other sources),
> > and
> > > let the accumulators take up to 4000 kcal/mol-A forces. About one in a
> > > million individual forces gets above 64 kcal/mol-A, and it becomes
> > > exponentially less likely to get forces of larger and larger sizes the
> > > higher up you go.
> > >
> >
> > An obvious counterexample is minimization, although it's not all that
> bad
> > to make people minimize on the CPU. Another perhaps less obvious
> > application that may result in overflowing forces/energies come from
> hybrid
> > MD/MC methodologies (e.g., H-REMD) with aggressive move sets that can
> > result in a few forces and/or energies becoming extremely large. The
> > obvious thing to happen here is to just reject that move and soldier on,
> > but if the simulation crashes or becomes corrupted , that may prove
> > limiting to some of these kinds of applications.
> >
> > Just some thoughts.
> >
> > All the best,
> > Jason
> >
> > --
> > Jason M. Swails
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Jun 05 2017 - 03:00:03 PDT