Re: [AMBER-Developers] weird behavior for pmemd.cuda on Volta cards

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sat, 9 Dec 2017 12:19:29 -0500

Hi Adrian,

What case do you have and how is it cooled? Is this an Exxact 2U box? If so then it likely has ducted cooling and the fans set to maximum already which would explain things.

All the best
Ross

> On Dec 8, 2017, at 12:33, Adrian Roitberg <roitberg.ufl.edu> wrote:
>
> Hi
>
> We have not been able to reproduce this.
>
> Delaram in my group just finished a test in our system,
>
> I attach an output from the following script
>
> nvidia-smi
>
> run jac regular
>
> nvidia-smi
>
> run jac long
>
> nvidia-smi
>
> run jac regular
>
>
> As you can see, the timings where actually a little bit better for the long run.
>
> [0] 1 x GPU: Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
> | ns/day = 927.94 seconds/ns = 93.11
> [0] 1 x GPU: Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
> | ns/day = 934.01 seconds/ns = 92.50
> [0] 1 x GPU: Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
> | ns/day = 929.69 seconds/ns = 92.93
>
> Dave, my guess is that maybe the GPU temperature is going high ?
>
> Adrian
>
>
> On 12/8/17 12:17 PM, David Cerutti wrote:
>> OK that then confirms some odd things I had been seeing. With systems
>> larger than JAC, and probably longer overall run times, I was also seeing
>> dramatic performance decreases, to the point where our Volta was giving the
>> performance of a GP100. It's good to know, then, that our Volta in Case
>> lab is not unique (uniquely broken).
>>
>> Dave
>>
>>
>> On Fri, Dec 8, 2017 at 9:11 AM, David A Case <david.case.rutgers.edu> wrote:
>>
>>> Hi folks:
>>>
>>> The few developers that have Volta cards have reported markedly different
>>> speedups vs. Pascal for different benchmarks.
>>>
>>> I think these may be related to the following observation: jobs seem to
>>> slow
>>> down the longer they run. You can check this on the JAC_production_NVE_4fs
>>> benchmark: make nstlim ten times large, and re-run; (you can increase ntwx
>>> and ntwr if you like--doesn't seem to make much difference).
>>>
>>> For me, the default run (250000 steps) clocks at 923 ns/day (total time is
>>> 95 sec.) This is in line with what others are getting
>>>
>>> The 10x longer run returns 824 ns/day; if I also increase ntwx by a
>>> factor of
>>> 10, I get up to 847 ns/day. (total time of 0.28 hours).
>>>
>>> A 100x run kind of plateaus at 830 ns/day (total time of 2.9 hours).
>>>
>>> For larger systems, the difference between the "short run" timings (which
>>> I suspect are typical of the official benchmarks) and "real" production
>>> runs can be larger. For at 391000 atom system, 10000 steps (82 seconds)
>>> runs at 51 ns/day, whereas 50000 steps (450 sec.) runs at 40 ns/day,
>>> and 100000 steps (900 seconds) is at 39 ns/day. These are jobs with
>>> ntwx=ntwr=100000, so there is no dumping of coordinates to disk, etc.
>>>
>>> So:
>>>
>>> 1. Be careful with benchmarks: the official JAC benchmark, at 250000
>>> steps,
>>> is not long enough for this platform. (!?) Same is probably true for other
>>> benchmarks.
>>>
>>> 2. If we can figure out what is causing the slowdown, we might see a way
>>> to
>>> get performance improvements in legacy mode.
>>>
>>> ...dac
>>>
>>>
>>> _______________________________________________
>>> AMBER-Developers mailing list
>>> AMBER-Developers.ambermd.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ambermd.org_mailman_listinfo_amber-2Ddevelopers&d=DwICAg&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=JAg-KQEjdZeg_E8PHDDoaw&m=YSYXelmKCIYhlT-zzctQXxjKrHFtu95JjRy3mgDfeuM&s=pgme-6nlkpAMGrSzfVQK4j1XJFCr2uet6dT8Xt28MRc&e=
>>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ambermd.org_mailman_listinfo_amber-2Ddevelopers&d=DwICAg&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=JAg-KQEjdZeg_E8PHDDoaw&m=YSYXelmKCIYhlT-zzctQXxjKrHFtu95JjRy3mgDfeuM&s=pgme-6nlkpAMGrSzfVQK4j1XJFCr2uet6dT8Xt28MRc&e=
>
> --
> Dr. Adrian E. Roitberg
> University of Florida Research Foundation Professor
> Department of Chemistry
> University of Florida
> roitberg.ufl.edu
> 352-392-6972
>
> <test-bench.txt>_______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Dec 09 2017 - 09:30:04 PST
Custom Search