[AMBER-Developers] weird behavior for pmemd.cuda on Volta cards

From: David A Case <david.case.rutgers.edu>
Date: Fri, 8 Dec 2017 09:11:15 -0500

Hi folks:

The few developers that have Volta cards have reported markedly different
speedups vs. Pascal for different benchmarks.

I think these may be related to the following observation: jobs seem to slow
down the longer they run. You can check this on the JAC_production_NVE_4fs
benchmark: make nstlim ten times large, and re-run; (you can increase ntwx
and ntwr if you like--doesn't seem to make much difference).

For me, the default run (250000 steps) clocks at 923 ns/day (total time is
95 sec.) This is in line with what others are getting

The 10x longer run returns 824 ns/day; if I also increase ntwx by a factor of
10, I get up to 847 ns/day. (total time of 0.28 hours).

A 100x run kind of plateaus at 830 ns/day (total time of 2.9 hours).

For larger systems, the difference between the "short run" timings (which
I suspect are typical of the official benchmarks) and "real" production
runs can be larger. For at 391000 atom system, 10000 steps (82 seconds)
runs at 51 ns/day, whereas 50000 steps (450 sec.) runs at 40 ns/day,
and 100000 steps (900 seconds) is at 39 ns/day. These are jobs with
ntwx=ntwr=100000, so there is no dumping of coordinates to disk, etc.

So:

1. Be careful with benchmarks: the official JAC benchmark, at 250000 steps,
is not long enough for this platform. (!?) Same is probably true for other
benchmarks.

2. If we can figure out what is causing the slowdown, we might see a way to
get performance improvements in legacy mode.

...dac


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Dec 08 2017 - 06:30:03 PST
Custom Search