Re: [AMBER-Developers] On the precision of SPFP in pmemd.cuda

From: B. Lachele Foley <>
Date: Sat, 3 Jun 2017 04:45:17 +0000

Was this ever put on the wiki? I'm not sure how much I can contribute, but I'm curious about things. For example, what would you do with the newly freed shared memory? (and if you said that, sorry but I got lost somewhere) Also, these recommendations - are they something I should share with labmates as actual recommendations or as possible ones?

:-) Lachele

Dr. B. Lachele Foley
Associate Research Scientist
Complex Carbohydrate Research Center
The University of Georgia
Athens, GA USA
From: David Cerutti <>
Sent: Wednesday, May 17, 2017 1:08:31 AM
To: AMBER Developers Mailing List
Subject: Re: [AMBER-Developers] On the precision of SPFP in pmemd.cuda

I'd be happy to post this on a wiki. Ideally, we can get a rich wiki
environment incorporated into the new website, which I still need to put
the capstone on.

I was considering minimization last night--the code *already* clamps forces
at 10000 kcal/mol-A for minimization, no matter the mode (it just takes
place at different stages of the accumulation), so for minimization modes
in SPFP I've lowered that barrier to 1800 kcal/mol-A (unlikely that two
very large forces will align so tightly that they could combine to break
2048 kcal/mol-A). Also there are notes on the current GPU pages warning
people not to minimize with SPFP.

The "official" internal coordinate representation of pmemd.cuda is double
precision, and iwrap is inconsequential. What happens, rather, is that
when the pair list gets built, the atoms get re-imaged and placed into the
hash table as floating point numbers. Long story short, each cell in the
hash table has an origin, and the atom locations are stored, it seems, by
their Cartesian displacements from that origin. Each cell in the hash
table seems to be at least the cutoff plus non-bonded buffer zone, so think
9A + 2A or 10A + 2A in the JAC case above, and the cells have to span the
unit cell. What that means is that the floating point numbers have to
cover a range of [0, cutoff) initially, and (0-buffer, cutoff+buffer) as
the pairlist gets older, and that means numbers of up to 10-12. Plug those
into this handy website:

and one can see that, for numbers between 8 and 16, the mantissa's least
significant bits correspond to displacements of 1 part in 1048576, or about
a millionth of an Angstrom. We may be able to rework the cell grid and
store atom locations relative to the CENTER of the cell, and pick up
another bit worth of precision in most cases (have the numbers span a range
from -7.5 to +7.5, for example--the last significant bits will correspond
to displacements of 1 / 2097152 Angstroms). Also there may be cases,
perhaps in octahedral unit cells, where the a few of the atoms' cell
coordinates have to get bigger than 16, and would thus lose a bit of
precision (literally).

For those applications with aggressive swapping, it might be feasible to
keep the original form of the kernel in the code and apply that for the
first few steps after a swap move, to give extremely large forces time to
dissipate before switching back to the streamlined version.

One way or another, in 2015 D. E. Shaw made their code run 30% faster than
ours (after accounting for their multiple timesteps and so forth), breaking
away in what was once a tight race. I think I know how to get that
performance in pmemd.cuda, and more __shared__ memory is one thing that
will help.


On Tue, May 16, 2017 at 11:11 PM, Jason Swails <>

> Hi Dave,
> This is a lot of cool information and a nicely complete investigation of
> this -- it was an enjoyable and informative read. Maybe I could encourage
> you to start up a Wiki page with some of your investigations so your
> investigative work survives a bit longer past the end of this discussion.
> On Tue, May 16, 2017 at 10:10 PM, David Cerutti <>
> wrote:
> > (In SP modes pmemd.cuda
> > can represent positions to a precision of 1/1048576 A, a number which
> won't
> > change unless the cutoff gets needlessly large or unadvisably small.)
> ​What does the internal representation of the coordinates look like for a
> PME simulation in pmemd.cuda when iwrap is set to 0 (i.e., no wrapping is
> done)? As coordinates grow, this precision you pointed out degrades (the
> difference between adjacent real numbers that can be represented with
> fixed-size floating point numbers grows as the absolute values of the
> numbers themselves grow). Obviously with a nicely packed box (like how
> Amber simulations usually start or restarting from a simulation using
> iwrap=1), the precision won't vary much across 64 A. But if coordinates
> are allowed to grow without bound, I suspect this precision could quickly
> become the largest source of error.
> Probably the safest way to run an iwrap=0 simulation is to maintain the
> "compact" representation of coordinates internally (so as to avoid losing
> precision when computing energies and/or forces) as well as a set of
> translations for each atom so that the "unwrapped" representation can be
> returned for printing in the corresponding output files. I'm pretty sure
> this is what OpenMM does, but I have no clue about pmemd.cuda.
> ​
> > representation. In this format I can handle forces up to 2000
> kcal/mol-A.
> >
> > I'd conclude this by saying "I don't think that any simulation that's
> > stable at all is going to break that," but it sounds too ominous and will
> > no doubt invite comments of "famous last words..." If really necessary,
> I'm
> > pretty confident that I can push to 19 bits of precision past the decimal
> > (force truncation is then getting us slightly more error than the
> > coordinate rounding, but still way way below any of the other sources),
> and
> > let the accumulators take up to 4000 kcal/mol-A forces. About one in a
> > million individual forces gets above 64 kcal/mol-A, and it becomes
> > exponentially less likely to get forces of larger and larger sizes the
> > higher up you go.
> >
> ​An obvious counterexample is minimization, although it's not all that bad
> to make people minimize on the CPU. Another perhaps less obvious
> application that may result in overflowing forces/energies come from hybrid
> MD/MC methodologies (e.g., H-REMD) with aggressive move sets that can
> result in a few forces and/or energies becoming extremely large. The
> obvious thing to happen here is to just reject that move and soldier on,
> but if the simulation crashes or becomes corrupted , that may prove
> limiting to some of these kinds of applications.
> Just some thoughts.
> All the best,
> Jason
> --
> Jason M. Swails
> _______________________________________________
> AMBER-Developers mailing list
AMBER-Developers mailing list
AMBER-Developers mailing list
Received on Fri Jun 02 2017 - 22:00:03 PDT
Custom Search