Re: [AMBER-Developers] Oops, I introduced a bug into pmemd.cuda_SPFP (master branch), but how shall I resolve it? from Ross Walker on 2017-10-25 (Amber Developers Archive Oct 2017)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 25 Oct 2017 20:58:29 -0400

The DPFP testing was added to avoid confusion for end users. The chances a bad installation, or rare compiler bug would affect SPFP and not DPFP are sufficiently small that I judged it reasonable to assume that, from an end user perspective if DPFP passed then their installation was good. This way we avoid lots of messages from concerned users.

However, that does not mean one can avoid such testing when modifying the code and certainly one can't just rely on our test suite. You have to sweat the blood doing the validation I'm afraid. I never found a good fix for SPFP with dacdif. The test case format really just doesn't lend itself well to that situation. In the end after any major modification to the GPU code I would always run the full test suite with SPFP and DPFP with dacdif set to save all the output and then I would go through and manually compare things by hand myself. In particular every couple of weeks I would do a full comparison of the latest SPFP output against the CPU output by hand to make sure everything looked good as well as repeating the long timescale validation tests I used in the GPU papers to check for energy convergence. Once satisfied everything was good I would create a gold set of save files for each of the GPU models I had in my machine (at least one from each major generation, typically I would test on between 4 to 6 different models) and then, because of the deterministic nature of GPU code I could rerun the test suite and get perfect comparisons against that gold standard. For the vast majority of work the answers wouldn't change so I could just blast it through my gold standard test set. If something did change I knew it was either a bug or a rounding difference from changing orders in the code etc, or modifying the way the random number generator worked. I would then conduct a careful by hand comparison again.

Ultimately for every hour of coding work I did I probable spent upwards of 2 hours doing validation. Validation is unfortunately not sexy, slows down coding and ultimately sucks but this is arguably the price we pay for developing scientific software that thousands of people will use and rely on the results from. Given our scientific reputations are on the line for the code we write it is very important to take the time to do this validation.

My 0.02 btc,

All the best
Ross

> On Oct 25, 2017, at 18:50, Jason Swails <jason.swails.gmail.com> wrote:
>
> On Wed, Oct 25, 2017 at 6:48 PM, Hai Nguyen <nhai.qn.gmail.com> wrote:
>>
>>
>> Just a side question: if 99% of the time will be SPFP, why do we make the
>> DPFP as default for testing?
>>
>
> Different kind of testing. DPFP can be compared directly to CPU results.
> No other precision really can. But that should certainly not be in
> replacement of SPFP testing, which I think was your point.
>
> --
> Jason M. Swails
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed Oct 25 2017 - 18:00:01 PDT