Update: I found the bugzilla entry (and workaround) for this,
http://bugzilla.ambermd.org/show_bug.cgi?id=292
.... should have checked before I went to dev list.
On 12 December 2016 at 11:23, Josh Berryman <
the.real.josh.berryman.gmail.com> wrote:
> Hi, I have something like this bug, running nvcc 8.0.44, and a fresh
> checkout of the code. Card is [Tesla M2070] (rev a3).
>
> This is a thermalisation run with position restraints:
>
> heat using langevin
> &cntrl
>
> iwrap = 1,
>
> ntx = 5, ioutfm = 1,
>
> irest = 1,
>
> ntb = 1,
> cut = 8,
> ntr = 0,
>
> ntc = 2,
>
> ntf = 2,
>
> ntt = 3, temp0=300.0, gamma_ln=0.001,
> nstlim = 500000 <50%2000%2000>, dt = 0.002,
>
> ntpr = 1000, ntwx = 50000 <50%20000>, ntwr = 500000 <50%2000%2000>,
> ntr = 1, restraint_wt=50, restraintmask=':1,64,65,128',
>
> /
>
> error message is:
>
> Error: an illegal memory access was encountered launching kernel
> kClearForces
> cudaFree GpuBuffer::Deallocate failed an illegal memory access was
> encountered
>
> Turning off the restraints allows the code to run, but I need them: it is
> ultimately supposed to be a pulling simulation, and I gotta pull.
>
> The same input file works fine with pmemd.MPI.
>
> I doubt that this is actually the same error as Martin, as his outfile
> (link above) includes actual steps of MD, and doesn't include restraints.
> Posting on this thread anyway as it is the same error message.
>
> Josh
>
>
> On 24 August 2015 at 17:53, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi Parker,
>>
>> I would have said that you had a failing GPU but the fact you see it on
>> multiple GPU types is troubling. That said I've been seeing several issues
>> (all very hard to tie down) with several versions of the 346 and 340 driver
>> trees. So you might want to try switching to a different driver tree.
>>
>> First I'd stick with CUDA 6.5. CUDA 7 breaks all the device selection
>> when process exclusive mode is enabled so I'd punt on that until CUDA 7.5
>> at least (which 'supposedly' should fix it).
>>
>> I'd try 346.89 from here: http://www.nvidia.com/download
>> /driverResults.aspx/88814/en-us
>>
>> and see if that fixes it. If not then the next step is to see if this
>> might be a bug in the AMBER code for which I'll need copies of your input
>> files.
>>
>> All the best
>> Ross
>>
>> > On Aug 22, 2015, at 1:58 PM, Parker de Waal <Parker.deWaal.vai.org>
>> wrote:
>> >
>> > Hi AMBER deverlopers,
>> >
>> > Just recently I’ve started to encounter the following error on both my
>> local machine (GTX780) and in house HPC cluster (K80s) while using
>> pmemd.cuda (with all patches applied) on both CUDA 6.5 (Driver Version:
>> 340.76) and 7 (Driver Version: 346.59):
>> >
>> > Error: an illegal memory access was encountered launching kernel
>> kClearForces
>> > cudaFree GpuBuffer::Deallocate failed an illegal memory access was
>> encountered
>> >
>> > The crash behavior has only happened thus far with during production
>> simulations (sample outfile can be found here:
>> https://gist.github.com/anonymous/ba6cf66147b7a68dc167)
>> >
>> >> From the metrics I’ve looked at the system is at equilibrium.
>> Additionally, in the past I’ve been able to launch this same simulation
>> protocol for times of ~1 us without issue.
>> >
>> > Has anyone, besides myself last year, encountered an issue such as this?
>> >
>> > If any additionally information is required please let me know!
>> >
>> > Best,
>> > Parker
>> >
>> >
>> > _______________________________________________
>> > AMBER mailing list
>> > AMBER.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber
>>
>>
>> _______________________________________________
>> AMBER mailing list
>> AMBER.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber
>>
>
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Dec 12 2016 - 06:30:02 PST