[AMBER-Developers] On the precision of SPFP in pmemd.cuda

From: David Cerutti <dscerutti.gmail.com>
Date: Tue, 16 May 2017 22:10:05 -0400

There's been a spirited discussion among members of Castle Case and Ross
Walker's merry men on the subject of precision in SPFP mode. This is not a
discussion of how much precision we absolutely need--that's for GROMACS and
D. E. Shaw--rather, it's a discussion of how much precision we currently
have, what we can get away with before making that situation any worse, and
what we should be steering our users to take.

Here are some useful metrics that I've deduced. These numbers are for the
accumulated force on each atom, averaged over all atoms, in the JAC
benchmark. It runs in a 64A box, 9A cutoff, dsum_tol = 1.0e-5, 64 point
cubic mesh. I used the mdgx program to run the PME calculations in full
double precision and print the forces. I modeled a reduction to 32-bit
coordinates by perturbing atoms by up to 1.0e-6 A in the inpcrd file
(precision to 1.0e-7) and recomputing the numbers. (In SP modes pmemd.cuda
can represent positions to a precision of 1/1048576 A, a number which won't
change unless the cutoff gets needlessly large or unadvisably small.) The
error due to the coordinate representation is a sum of non-bonded forces
(the bonded interactions are done in double precision even in SPFP mode).
All results are in kcal/mol-A.

Overall error due to 32-bit coords : 0.0000095
Error in reciprocal space, default settings : 0.0032053
Error in direct space electrostatics, : 0.0031423
Error due to 9A LJ truncation : 0.0008458

There's been a motion to increase our default settings to dsum_tol 1.0e-6.
This reduces direct space error by a factor of roughly 10, but it increases
reciprocal space error unless something is done to improve the precision
there (i.e. more grid density, or a longer direct space cutoff, as higher
interpolation orders aren't in the cards for now). I think that the
coordinate representation is a good metric to bear in mind: 1.0e-5
kcal/mol-A... how many times greater are our other sources of error?

If we go down all the way to dsum_tol = 1.0e-6 and match that in the
reciprocal space part, we'll be matching the precision of the LJ part for a
~11A cutoff. I'd propose we urge people to use dsum_tol = 4.0e-6 with a
10A cutoff. This will give us the following errors (understand, the direct
space elec/LJ errors will be consistent across different simulations, and
the reciprocal space error in the JAC benchmark is close to a maximum
because the 62A cubic box produces grid spacings very near 1A, the maximum
that Amber's default grid sizer will accept).

Error due to dsumtol = 4.0e-6 : 0.0013631 kcal/mol-A
Error due to 10A LJ truncation : 0.0005210 kcal/mol-A
Error in reciprocal space : 0.0012215 kcal/mol-A

Another thing that we can do with an understanding of the coordinate
representation is tailor the representation of forces as we accumulate them
in the non-bonded direct space sum. It would be very helpful to those of
us programming pmemd.cuda if we could keep things in a 32-bit
representation until we're ready to commit. I'm building a version of
pmemd.cuda that accumulates up to 32 direct space forces (all results for
one thread in one tile) with 20 bits of precision after the decimal, then
recasts the 32-bit accumulators to 64-bit and scales the numbers by another
2^20 to put them in the same units as the other forces. It boosts
performance a few percent and saves a lot of __shared__ memory which can be
put to other uses. I've estimated the error due to truncating individual
forces at 20 bits after the decimal (0.000001 kcal/mol-A) and accumulating
~180 of them per atom (about what happens with a 10A cutoff): 7.2e-6
kcal/mol-A error, or about the same as the error due to the coordinate
representation. In this format I can handle forces up to 2000 kcal/mol-A.

I'd conclude this by saying "I don't think that any simulation that's
stable at all is going to break that," but it sounds too ominous and will
no doubt invite comments of "famous last words..." If really necessary, I'm
pretty confident that I can push to 19 bits of precision past the decimal
(force truncation is then getting us slightly more error than the
coordinate rounding, but still way way below any of the other sources), and
let the accumulators take up to 4000 kcal/mol-A forces. About one in a
million individual forces gets above 64 kcal/mol-A, and it becomes
exponentially less likely to get forces of larger and larger sizes the
higher up you go.

If anyone else has thoughts, please share.

Dave
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue May 16 2017 - 19:30:03 PDT
Custom Search