Re: [AMBER-Developers] Oops, I introduced a bug into pmemd.cuda_SPFP (master branch), but how shall I resolve it?

From: Daniel Roe <>
Date: Thu, 26 Oct 2017 08:49:41 -0400

On Wed, Oct 25, 2017 at 8:58 PM, Ross Walker <> wrote:
> However, that does not mean one can avoid such testing when modifying the code and certainly one can't just rely on our test suite. You have to sweat the blood doing the validation I'm afraid. I never found a good fix for SPFP with dacdif. The test case format really just doesn't lend itself well to that situation. In the end after any major modification to the GPU code I would always run the full test suite with SPFP and DPFP with dacdif set to save all the output and then I would go through and manually compare things by hand myself. In particular every couple of weeks I would do a full comparison of the latest SPFP output against the CPU output by hand to make sure everything looked good as well as repeating the long timescale validation tests I used in the GPU papers to check for energy convergence. Once satisfied everything was good I would create a gold set of save files for each of the GPU models I had in my machine (at least one from each major generation, typically I would test on between 4 to 6 d
ifferent models) and then, because of the deterministic nature of GPU code I could rerun the test suite and get perfect comparisons against that gold standard. For the vast majority of work the answers wouldn't change so I could just blast it through my gold standard test set. If something did change I knew it was either a bug or a rounding difference from changing orders in the code etc, or modifying the way the random number generator worked. I would then conduct a careful by hand comparison again.

Could you upload these gold standards to the GIT master branch? They
can be excluded from the Amber tarball but I think it would be great
for devs to have access to them. This way we can at least compare
against a GPU generation that's closer to something you have. For
example, it looks like the majority of the current SPFP test outputs
were generated with a GeForce GTX TITAN X. If I run the tests with a
K20m the largest diff I get is 5.86e-01 kcal/mol absolute error,
whereas if I run with a TITAN Xp the largest diff is 3.09e-01, and I
end up with fewer diffs. So that's one way to cut down on the false

> Ultimately for every hour of coding work I did I probable spent upwards of 2 hours doing validation. Validation is unfortunately not sexy, slows down coding and ultimately sucks but this is arguably the price we pay for developing scientific software that thousands of people will use and rely on the results from. Given our scientific reputations are on the line for the code we write it is very important to take the time to do this validation.

Agreed, validation is extremely important. Adding CI testing with
several generations of GPUs can only improve our QC.


> My 0.02 btc,
> All the best
> Ross
>> On Oct 25, 2017, at 18:50, Jason Swails <> wrote:
>> On Wed, Oct 25, 2017 at 6:48 PM, Hai Nguyen <> wrote:
>>> Just a side question: if 99% of the time will be SPFP, why do we make the
>>> DPFP as default for testing?
>> Different kind of testing. DPFP can be compared directly to CPU results.
>> No other precision really can. But that should certainly not be in
>> replacement of SPFP testing, which I think was your point.
>> --
>> Jason M. Swails
>> _______________________________________________
>> AMBER-Developers mailing list
> _______________________________________________
> AMBER-Developers mailing list

Daniel R. Roe
Laboratory of Computational Biology
National Institutes of Health, NHLBI
5635 Fishers Ln, Rm T900
Rockville MD, 20852
AMBER-Developers mailing list
Received on Thu Oct 26 2017 - 06:00:03 PDT
Custom Search