Just a quick amendment... while option 1 is simple, many users are not in
control of the queuing system in which they are running, so it may not be a
possibility. Thus, those with influence should continue to prod nVidia
where possible....
Cheers,
Brent
On Tue, Nov 11, 2014 at 10:18 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
> Yes this is a LONG standing bug in the NVIDIA drivers - I think there is
> an actual NVIDIA bug filed for it but I can't recall the ID right now.
> Essentially it is a flaw in the way they implement unified memory -
> although it's really 6 of one and half a dozen of the other since it is a
> hack for something that I think is problematical in the linux kernel
> itself. Ultimately they have to allocate huge amounts of virtual memory -
> however it is instantly paged out so is not an issue.
>
> So while this is an issue - it is also a non-issue from almost all
> perspectives EXCEPT queuing systems which don't correctly track memory
> usage of an application. It will happen with all CUDA codes. The solution
> is:
>
> 1) Don't enforce memory limits - probably the easiest solution.
>
> 2) Figure out a way for the queuing system to look at resident memory
> rather than virtual memory - not sure if slurm can do this but if it can't
> someone should probably file a bug report with Slurm and find a way to
> cross link it with the NVIDIA driver people.
>
> I'd go for 1 since it is simple.
>
> All the best
> Ross
>
>
> On 11/7/14, 11:28 AM, "Jason Swails" <jason.swails.gmail.com> wrote:
>
> >On Fri, 2014-11-07 at 11:37 -0700, Thomas Cheatham wrote:
> >> Any body have some ideas about this? Basically "cgroups" are a way to
> >> create a virtual container, one which you can restrict memory to a
> >> sub-process, etc (i.e. for example to partition a node into two
> >> independent halves). Thanks! --tom
> >
> >Just to add a little to Scott's comment: this seems to be an issue with
> >the CUDA RT in general. I ran a quick test on my machine where I
> >started up CUDA-enabled VMD with a single PDB file and another small
> >simulation with OpenMM's CUDA platform.
> >
> >Both programs consumed around 37 GB of virtual memory (not real memory,
> >though) on my desktop (with 16 GB of RAM total). When pmemd.cuda runs
> >on my machine, it consumes the same amount of virtual RAM. I would try
> >some of the CUDA SDK codes to confirm the issue in those, too, but they
> >don't run long enough to actually monitor the memory usage.
> >
> >So it's definitely not just Amber -- it's every other CUDA-enabled
> >program I tried running on my machine too. I know this has been
> >discussed in a few threads in the past, but I couldn't seem to find them
> >in the archives easily. (Not sure if it was amber-dev or amber, to be
> >honest).
> >
> >This was all done with the nVidia driver 340.32 and CUDA 5.5 on my
> >machine (although I've observed it for every other driver version I've
> >had, too, which is quite a few).
> >
> >All the best,
> >Jason
> >
> >--
> >Jason M. Swails
> >BioMaPS,
> >Rutgers University
> >Postdoctoral Researcher
> >
> >
> >_______________________________________________
> >AMBER-Developers mailing list
> >AMBER-Developers.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
--
_______________________________________________
Brent P. Krueger.....................phone: 616 395 7629
Professor................................fax: 616 395 7118
Hope College..........................Schaap Hall 2120
Department of Chemistry
Holland, MI 49423
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed Nov 12 2014 - 05:00:06 PST