> On Oct 26, 2017, at 08:49, Daniel Roe <daniel.r.roe.gmail.com> wrote:
>
> On Wed, Oct 25, 2017 at 8:58 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>>
>> However, that does not mean one can avoid such testing when modifying the code and certainly one can't just rely on our test suite. You have to sweat the blood doing the validation I'm afraid. I never found a good fix for SPFP with dacdif. The test case format really just doesn't lend itself well to that situation. In the end after any major modification to the GPU code I would always run the full test suite with SPFP and DPFP with dacdif set to save all the output and then I would go through and manually compare things by hand myself. In particular every couple of weeks I would do a full comparison of the latest SPFP output against the CPU output by hand to make sure everything looked good as well as repeating the long timescale validation tests I used in the GPU papers to check for energy convergence. Once satisfied everything was good I would create a gold set of save files for each of the GPU models I had in my machine (at least one from each major generation, typically
> I would
> test on between 4 to 6 different models) and then, because of the deterministic nature of GPU code I could rerun the test suite and get perfect comparisons against that gold standard. For the vast majority of work the answers wouldn't change so I could just blast it through my gold standard test set. If something did change I knew it was either a bug or a rounding difference from changing orders in the code etc, or modifying the way the random number generator worked. I would then conduct a careful by hand comparison again.
>
> Could you upload these gold standards to the GIT master branch? They
> can be excluded from the Amber tarball but I think it would be great
> for devs to have access to them. This way we can at least compare
> against a GPU generation that's closer to something you have. For
> example, it looks like the majority of the current SPFP test outputs
> were generated with a GeForce GTX TITAN X. If I run the tests with a
> K20m the largest diff I get is 5.86e-01 kcal/mol absolute error,
> whereas if I run with a TITAN Xp the largest diff is 3.09e-01, and I
> end up with fewer diffs. So that's one way to cut down on the false
> positives.
>
I don't have them anymore since leaving SDSC unfortunately. In addition they wouldn't be any use these days anyway since they are out of date and the code has changed too much so the output won't match anymore even if one uses the exact same hardware. Given the unchecked divergence it's probably now necessary to repeat all of the original validation work I did when developing the SPDP and later SPFP precision models. That way one can make sure the code is still working correctly and within the expected tolerances.
All the best
Ross
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Oct 26 2017 - 17:30:02 PDT