[AMBER-Developers] gpu-tachyon branch causing error on GPU

From: Bill Miller III <brmilleriii.gmail.com>
Date: Wed, 16 Oct 2013 21:53:54 -0400

Hi,

I have been trying to run some MD using mass partitioning with the
gpu-tachyon branch of the Amber git tree. I did a 'git pull' this morning
and compiled pmemd(.MPI) and pmemd.cuda(.MPI). When I attempt to run the
simulation with the following input:

Explicit solvent molecular dynamics constant pressure 25 ns MD
 &cntrl
   imin=0, irest=1, ntx=5,
   ntpr=25000, ntwx=25000, ntwr=-50000, nstlim=12500000,
   dt=0.004, ntt=3, tempi=335,
   temp0=335, gamma_ln=1.0, ig=-1,
   ntp=1, ntc=2, ntf=2, cut=9,
   ntb=2, iwrap=1, ioutfm=1,
 /

with pmemd.cuda, I get the following error:

Nonbond cells need to be recalculated, restart simulation from previous
checkpoint
with a higher value for skinnb.

To correct this error, I increased the value of skinnb to 3.0 in the mdin
file (by adding &ewald namelist), and this error goes away but I get a
different error:

cudaMemcpy GpuBuffer::Download failed unspecified launch failure

and the simulation dies after only a few steps (~ 6 steps). The
temperature, pressure, and energies increase substantially after the first
step.. I have shown the energies below for the first three steps:

 NSTEP = 1 TIME(PS) = 455500.004 TEMP(K) = 376.45 PRESS =
 -171.4
 Etot = -176203.3645 EKtot = 61495.2500 EPtot =
-237698.6145
 BOND = 2194.3161 ANGLE = 5722.4810 DIHED =
 6928.1889
 1-4 NB = 2392.0336 1-4 EEL = 19312.4710 VDWAALS =
23856.6106
 EELEC = -298104.7157 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 23351.5596 VIRIAL = 26376.5669 VOLUME =
817600.0187
                                                    Density =
0.9977
 ------------------------------------------------------------------------------


 NSTEP = 2 TIME(PS) = 455500.008 TEMP(K) = NaN PRESS
=-52155.0
 Etot = NaN EKtot = NaN EPtot =
601744736.3088
 BOND = 2395.5598 ANGLE = 7383.8134 DIHED =
 7021.7748
 1-4 NB = 2444.0931 1-4 EEL = 19301.1203 VDWAALS =
601920908.4454
 EELEC = -214718.4980 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 23537.8979 VIRIAL = 944202.1688 VOLUME =
817574.8784
                                                    Density =
0.9977
 ------------------------------------------------------------------------------


 NSTEP = 3 TIME(PS) = 455500.012 TEMP(K) = NaN PRESS
=622386.7
 Etot = NaN EKtot = NaN EPtot =
**************
 BOND = ************** ANGLE = 713224.4557 DIHED =
20586.1594
 1-4 NB = ************** 1-4 EEL = 12529.4645 VDWAALS =
**************
 EELEC = -275698.1771 EHBOND = 0.0000 RESTRAINT =
0.0000
 EKCMT = 12582912.0000 VIRIAL = 1698479.3771 VOLUME =
809967.6470
                                                    Density =
1.0071
 ------------------------------------------------------------------------------

And when I visualize the simulation the atoms go everywhere after the
initial frame.

However, when I run the same inputs (with and without increasing skinnb)
with the CPU code (pmemd.MPI), I do not get any of these errors and the
simulation appears to run smoothly (the energies and dynamics appear
normal).

The last commit for the gpu-tachyon branch before I compiled was commit
f04c58935955826a24e5a534b9ea3446a44fbb87.

Previously, I have gotten this simulation to work using pmemd.cuda from the
gpu-tachyon branch back in August. The last commit from that compilation
was

commit a0a4f71de7595c70fb0014d46f412fdfb767a134
Author: scott legrand <slegrand.amber.(none)>
Date: Wed Aug 14 16:37:20 2013 -0700

    Fix for Bug 210

It appears as if something was recently added to the gpu-tachyon branch
that broke pmemd.cuda's ability to run this simulation.

I have placed the prmtop and restart file used to run this simulation on
Dropbox, if anyone wants to try to reproduce the errors.

prmtop: https://www.dropbox.com/s/aqvgskaojbgbhwv/test.prmtop
restart: https://www.dropbox.com/s/f2tpv8pxovy6usz/test.rst7

I am running these tests on a linux workstation using Red Hat 6.4 OS, with
12 2.10 GHz Intel Xeon processors and a GTX-780 GPU. I am using nVidia
Driver version 325.15.

Let me know if you have any questions about any of the details.

Thanks,
Bill

-- 
Bill Miller III
Post-doc
University of Richmond
417-549-0952
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed Oct 16 2013 - 19:00:02 PDT
Custom Search