Re: [AMBER-Developers] weird behavior for pmemd.cuda on Volta cards

From: Ross Walker <>
Date: Sat, 9 Dec 2017 12:17:57 -0500

Note a lot of this performance drop off is due to the card clocking down. This is normally due to it hitting the temperature limit. It was first introduced with Kepler and is super annoying but generally wasn't that drastic. I think it has got worse as the cards have got more powerful and, possibly due to the underlying die area getting larger (more heat?).

It's worse if you don't have good cooling. It's best to put these cards in cases that have ducted front to back cooling. You also likely want to go into the bios and set all the fans at maximum. Finally make sure the inlet air is cool. Tests we did with Pascal when it first came out in the SDSC data center showed about a 3 to 5% boost in the performance plateau from reducing the machine room from a 'yucky' 88F down to 68F. Something that people running machines rooms are not considering when trying to optimize power efficiency.

The other key when benchmarking is to make sure the card is at production temperature before running the benchmarks. For what's on the AMBER website I typically run a validation burn in first for a few hours and then fire up the benchmarks. This gets you a more reliable number since you get the entire node up to the production temperature.

Of course it could also be due to something else but my instinct says it is temperature related.

This is also why I haven't posted benchmark numbers for Volta on the AMBER webpage yet. I am waiting until I have time to benchmark it properly.

All the best

> On Dec 8, 2017, at 12:17, David Cerutti <> wrote:
> OK that then confirms some odd things I had been seeing. With systems
> larger than JAC, and probably longer overall run times, I was also seeing
> dramatic performance decreases, to the point where our Volta was giving the
> performance of a GP100. It's good to know, then, that our Volta in Case
> lab is not unique (uniquely broken).
> Dave
> On Fri, Dec 8, 2017 at 9:11 AM, David A Case <> wrote:
>> Hi folks:
>> The few developers that have Volta cards have reported markedly different
>> speedups vs. Pascal for different benchmarks.
>> I think these may be related to the following observation: jobs seem to
>> slow
>> down the longer they run. You can check this on the JAC_production_NVE_4fs
>> benchmark: make nstlim ten times large, and re-run; (you can increase ntwx
>> and ntwr if you like--doesn't seem to make much difference).
>> For me, the default run (250000 steps) clocks at 923 ns/day (total time is
>> 95 sec.) This is in line with what others are getting
>> The 10x longer run returns 824 ns/day; if I also increase ntwx by a
>> factor of
>> 10, I get up to 847 ns/day. (total time of 0.28 hours).
>> A 100x run kind of plateaus at 830 ns/day (total time of 2.9 hours).
>> For larger systems, the difference between the "short run" timings (which
>> I suspect are typical of the official benchmarks) and "real" production
>> runs can be larger. For at 391000 atom system, 10000 steps (82 seconds)
>> runs at 51 ns/day, whereas 50000 steps (450 sec.) runs at 40 ns/day,
>> and 100000 steps (900 seconds) is at 39 ns/day. These are jobs with
>> ntwx=ntwr=100000, so there is no dumping of coordinates to disk, etc.
>> So:
>> 1. Be careful with benchmarks: the official JAC benchmark, at 250000
>> steps,
>> is not long enough for this platform. (!?) Same is probably true for other
>> benchmarks.
>> 2. If we can figure out what is causing the slowdown, we might see a way
>> to
>> get performance improvements in legacy mode.
>> ...dac
>> _______________________________________________
>> AMBER-Developers mailing list
> _______________________________________________
> AMBER-Developers mailing list

AMBER-Developers mailing list
Received on Sat Dec 09 2017 - 09:30:03 PST
Custom Search