Re: [AMBER-Developers] [AMBER] Error: an illegal memory access was encountered launching kernel kClearForces

From: Josh Berryman <the.real.josh.berryman.gmail.com>
Date: Mon, 12 Dec 2016 11:23:26 +0100

Hi, I have something like this bug, running nvcc 8.0.44, and a fresh
checkout of the code. Card is [Tesla M2070] (rev a3).

This is a thermalisation run with position restraints:

heat using langevin
&cntrl

  iwrap =
1,
  ntx = 5, ioutfm =
1,
  irest =
1,
  ntb = 1,
  cut = 8,
  ntr =
0,
  ntc =
2,
  ntf =
2,
  ntt = 3, temp0=300.0, gamma_ln=0.001,
  nstlim = 500000, dt =
0.002,
  ntpr = 1000, ntwx = 50000, ntwr = 500000,
  ntr = 1, restraint_wt=50,
restraintmask=':1,64,65,128',
/

error message is:

Error: an illegal memory access was encountered launching kernel
kClearForces
cudaFree GpuBuffer::Deallocate failed an illegal memory access was
encountered

Turning off the restraints allows the code to run, but I need them: it is
ultimately supposed to be a pulling simulation, and I gotta pull.

The same input file works fine with pmemd.MPI.

I doubt that this is actually the same error as Martin, as his outfile
(link above) includes actual steps of MD, and doesn't include restraints.
Posting on this thread anyway as it is the same error message.

Josh


On 24 August 2015 at 17:53, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi Parker,
>
> I would have said that you had a failing GPU but the fact you see it on
> multiple GPU types is troubling. That said I've been seeing several issues
> (all very hard to tie down) with several versions of the 346 and 340 driver
> trees. So you might want to try switching to a different driver tree.
>
> First I'd stick with CUDA 6.5. CUDA 7 breaks all the device selection when
> process exclusive mode is enabled so I'd punt on that until CUDA 7.5 at
> least (which 'supposedly' should fix it).
>
> I'd try 346.89 from here: http://www.nvidia.com/
> download/driverResults.aspx/88814/en-us
>
> and see if that fixes it. If not then the next step is to see if this
> might be a bug in the AMBER code for which I'll need copies of your input
> files.
>
> All the best
> Ross
>
> > On Aug 22, 2015, at 1:58 PM, Parker de Waal <Parker.deWaal.vai.org>
> wrote:
> >
> > Hi AMBER deverlopers,
> >
> > Just recently I’ve started to encounter the following error on both my
> local machine (GTX780) and in house HPC cluster (K80s) while using
> pmemd.cuda (with all patches applied) on both CUDA 6.5 (Driver Version:
> 340.76) and 7 (Driver Version: 346.59):
> >
> > Error: an illegal memory access was encountered launching kernel
> kClearForces
> > cudaFree GpuBuffer::Deallocate failed an illegal memory access was
> encountered
> >
> > The crash behavior has only happened thus far with during production
> simulations (sample outfile can be found here: https://gist.github.com/
> anonymous/ba6cf66147b7a68dc167)
> >
> >> From the metrics I’ve looked at the system is at equilibrium.
> Additionally, in the past I’ve been able to launch this same simulation
> protocol for times of ~1 us without issue.
> >
> > Has anyone, besides myself last year, encountered an issue such as this?
> >
> > If any additionally information is required please let me know!
> >
> > Best,
> > Parker
> >
> >
> > _______________________________________________
> > AMBER mailing list
> > AMBER.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber
>
>
> _______________________________________________
> AMBER mailing list
> AMBER.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Dec 12 2016 - 02:30:02 PST
Custom Search