[AMBER-Developers] pmemd.cuda SpeedBoost branch

From: David Cerutti <dscerutti.gmail.com>
Date: Sat, 20 May 2017 09:57:07 -0400

Dear Amber Devs,


I’ve checked in a new branch, “SpeedBoost,” which contains new tweaks to
pmemd.cuda intended to help us stay on top of the academic world and get
back on pace with D. E. Shaw’s GPU code. Something Shaw group did in 2015
helped their code to run about 30% faster than ours, even after counting
the multiple timesteps they do in the reciprocal space part. I think I see
ways to get that much more out of pmemd.cuda, and the tweaks I’ve made so
far are a boost of 10-14%.


The first tweak I’ve made is to use an adaptively indexed spline to compute
the electrostatic direct space derviatives. Because I can calculate one
number directly, rather than relying on a series of further adds, mults,
and exp() computations, I can tweak the last significant bits of the spline
coefficients to get better results out of 32-bit floats. I’ve been able to
exceed the current precision in the electrostatic derivatives by a factor
of 5 or 6, so the splines are both faster and more accurate.


The other major thing I’ve done so far is to change the way force, energy,
and virial accumulations are handled in the SPFP direct space routines. Rather
than immediately up-casting each floating point force or energy
contribution to a long long int, I accumulate up to 16 force contributions
from a single thread into 32 bit floats, then upcast the sum before sending
it to whatever accumulator is needed. This removes a lot of float to int
conversions and saves a little more run time. The determinism remains,
because the order in which each number is accumulated on its own warp is
preserved, and integer conversions still take place before pooling the
results of multiple warps. There is no danger of breaking a 32-bit integer
format with this approach—it goes directly form 32-bit float to 64-bit int.
However, if there is a very big force, it will wipe out the last few
significant bits of 15 other forces, not just its own contribution. I’ve
estimated the loss of precision involved and it is beneath the level of
error I referenced in an earlier message about having our coordinates in
32-bit precision.


By itself, the change in force accumulation probably wouldn’t be worth any
added uncertainty, but it sets up for more powerful changes to the way
warps process the pair interactions. I am confident that the changes I
have made are safe, but they need to be thoroughly tested. Please check
out the branch and give it a try; you should see a decent speedup now, and
I hope to check in further improvements shortly.


Dave
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat May 20 2017 - 07:00:04 PDT
Custom Search