Re: [AMBER-Developers] gpu-tachyon branch causing error on GPU

From: Scott Le Grand <varelse2005.gmail.com>
Date: Thu, 17 Oct 2013 09:56:07 -0700

Hey Bill,
Sync to TOT and try again. Think I fixed it. Thanks for the clear repro!

Scott



On Wed, Oct 16, 2013 at 6:53 PM, Bill Miller III <brmilleriii.gmail.com>wrote:

> Hi,
>
> I have been trying to run some MD using mass partitioning with the
> gpu-tachyon branch of the Amber git tree. I did a 'git pull' this morning
> and compiled pmemd(.MPI) and pmemd.cuda(.MPI). When I attempt to run the
> simulation with the following input:
>
> Explicit solvent molecular dynamics constant pressure 25 ns MD
> &cntrl
> imin=0, irest=1, ntx=5,
> ntpr=25000, ntwx=25000, ntwr=-50000, nstlim=12500000,
> dt=0.004, ntt=3, tempi=335,
> temp0=335, gamma_ln=1.0, ig=-1,
> ntp=1, ntc=2, ntf=2, cut=9,
> ntb=2, iwrap=1, ioutfm=1,
> /
>
> with pmemd.cuda, I get the following error:
>
> Nonbond cells need to be recalculated, restart simulation from previous
> checkpoint
> with a higher value for skinnb.
>
> To correct this error, I increased the value of skinnb to 3.0 in the mdin
> file (by adding &ewald namelist), and this error goes away but I get a
> different error:
>
> cudaMemcpy GpuBuffer::Download failed unspecified launch failure
>
> and the simulation dies after only a few steps (~ 6 steps). The
> temperature, pressure, and energies increase substantially after the first
> step.. I have shown the energies below for the first three steps:
>
> NSTEP = 1 TIME(PS) = 455500.004 TEMP(K) = 376.45 PRESS =
> -171.4
> Etot = -176203.3645 EKtot = 61495.2500 EPtot =
> -237698.6145
> BOND = 2194.3161 ANGLE = 5722.4810 DIHED =
> 6928.1889
> 1-4 NB = 2392.0336 1-4 EEL = 19312.4710 VDWAALS =
> 23856.6106
> EELEC = -298104.7157 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 23351.5596 VIRIAL = 26376.5669 VOLUME =
> 817600.0187
> Density =
> 0.9977
>
> ------------------------------------------------------------------------------
>
>
> NSTEP = 2 TIME(PS) = 455500.008 TEMP(K) = NaN PRESS
> =-52155.0
> Etot = NaN EKtot = NaN EPtot =
> 601744736.3088
> BOND = 2395.5598 ANGLE = 7383.8134 DIHED =
> 7021.7748
> 1-4 NB = 2444.0931 1-4 EEL = 19301.1203 VDWAALS =
> 601920908.4454
> EELEC = -214718.4980 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 23537.8979 VIRIAL = 944202.1688 VOLUME =
> 817574.8784
> Density =
> 0.9977
>
> ------------------------------------------------------------------------------
>
>
> NSTEP = 3 TIME(PS) = 455500.012 TEMP(K) = NaN PRESS
> =622386.7
> Etot = NaN EKtot = NaN EPtot =
> **************
> BOND = ************** ANGLE = 713224.4557 DIHED =
> 20586.1594
> 1-4 NB = ************** 1-4 EEL = 12529.4645 VDWAALS =
> **************
> EELEC = -275698.1771 EHBOND = 0.0000 RESTRAINT =
> 0.0000
> EKCMT = 12582912.0000 VIRIAL = 1698479.3771 VOLUME =
> 809967.6470
> Density =
> 1.0071
>
> ------------------------------------------------------------------------------
>
> And when I visualize the simulation the atoms go everywhere after the
> initial frame.
>
> However, when I run the same inputs (with and without increasing skinnb)
> with the CPU code (pmemd.MPI), I do not get any of these errors and the
> simulation appears to run smoothly (the energies and dynamics appear
> normal).
>
> The last commit for the gpu-tachyon branch before I compiled was commit
> f04c58935955826a24e5a534b9ea3446a44fbb87.
>
> Previously, I have gotten this simulation to work using pmemd.cuda from the
> gpu-tachyon branch back in August. The last commit from that compilation
> was
>
> commit a0a4f71de7595c70fb0014d46f412fdfb767a134
> Author: scott legrand <slegrand.amber.(none)>
> Date: Wed Aug 14 16:37:20 2013 -0700
>
> Fix for Bug 210
>
> It appears as if something was recently added to the gpu-tachyon branch
> that broke pmemd.cuda's ability to run this simulation.
>
> I have placed the prmtop and restart file used to run this simulation on
> Dropbox, if anyone wants to try to reproduce the errors.
>
> prmtop: https://www.dropbox.com/s/aqvgskaojbgbhwv/test.prmtop
> restart: https://www.dropbox.com/s/f2tpv8pxovy6usz/test.rst7
>
> I am running these tests on a linux workstation using Red Hat 6.4 OS, with
> 12 2.10 GHz Intel Xeon processors and a GTX-780 GPU. I am using nVidia
> Driver version 325.15.
>
> Let me know if you have any questions about any of the details.
>
> Thanks,
> Bill
>
> --
> Bill Miller III
> Post-doc
> University of Richmond
> 417-549-0952
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Oct 17 2013 - 10:00:02 PDT
Custom Search