Re: [AMBER-Developers] Parallel Test failures with CUDA 5.5

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Wed, 5 Feb 2014 21:46:09 -0700

Hi,

On Wed, Feb 5, 2014 at 7:08 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> So it looks like the problem is NOT cuda5.5 related but rather a bug in
> the parallel GPU code that may be showing up on 2 GPUs elsewhere or
> differently with different MPIs.
>
> Dan, what are your specs for the problems you are seeing?
>

This is running on bluewaters, 2 xk nodes (Tesla K20X cards). It could just
be something weird with their installed 5.5 libraries (wouldn't be the
first time I've had issues with their libs). I will try and test this on
some of our local GPUs tomorrow; I would do it now but my internet
connection has been going in and out at my house tonight and it's tough to
write scripts when the terminal keeps disconnecting...

One question: are all the GPUs you are testing in the same box? If so,
maybe it's something to do with actually having to go across a network
device?

I'll let you know what I find tomorrow. Take care,

-Dan


>
> All the best
> Ross
>
>
> On 2/5/14, 2:33 PM, "Daniel Roe" <daniel.r.roe.gmail.com> wrote:
>
> >Hi All,
> >
> >Has anyone seen really egregious test failures using
> >pmemd.cuda.MPI/cuda5.5
> >compiled from the GIT tree (updated today)? I'm getting some insane
> >differences and '***' in energy fields (see below for an example, full
> >test
> >diffs attached). I do not see this problem with pmemd.cuda/cuda5.5 or
> >pmemd.cuda.MPI/cuda5.0 (those diffs are attached as well and seem OK).
> >This
> >was compiled using GNU 4.8.2 compilers.
> >
> >Not sure if this means anything, but most of the failures seem to be with
> >PME; the only GB stuff that fails is AMD-related.
> >
> >Any ideas?
> >
> >-Dan
> >
> >---------------------------------------
> >possible FAILURE: check mdout.tip4pew_box_npt.dif
> >/mnt/b/projects/sciteam/jn6/GIT/amber-gnu/test/cuda/tip4pew
> >96c96
> >< NSTEP = 1 TIME(PS) = 0.002 TEMP(K) = 122.92 PRESS =
> > 42.6
> >> NSTEP = 1 TIME(PS) = 0.002 TEMP(K) = 128.19 PRESS =
> > 43.5
> ><snip>
> >426c426
> >< NSTEP = 40 TIME(PS) = 0.080 TEMP(K) = 38.69 PRESS =
> >659.4
> >> NSTEP = 40 TIME(PS) = 0.080 TEMP(K) = NaN PRESS =
> > NaN
> >427c427
> >< Etot = 18.6535 EKtot = 231.6979 EPtot =
> >240.1483
> >> Etot = NaN EKtot = NaN EPtot =
> > NaN
> >428c428
> >< BOND = 0.6316 ANGLE = 1.2182 DIHED =
> >0.3663
> >> BOND = ************** ANGLE = 361.5186 DIHED =
> >5.4026
> >429c429
> >< 1-4 NB = 0.8032 1-4 EEL = 1.3688 VDWAALS =
> >100.3454
> >> 1-4 NB = ************** 1-4 EEL = ************** VDWAALS =
> > NaN
> >430c430
> >< EELEC = 222.4484 EHBOND = 0. RESTRAINT = 0.
> >> EELEC = NaN EHBOND = 0. RESTRAINT = 0.
> >431c431
> >< EKCMT = 131.0089 VIRIAL = 699.4621 VOLUME =
> >192.3578
> >> EKCMT = 1278.0524 VIRIAL = NaN VOLUME =
> > NaN
> >432c432
> >< Density =
> >0.0030
> >> Density =
> > NaN
> >### Maximum absolute error in matching lines = 2.38e+04 at line 385 field
> >3
> >### Maximum relative error in matching lines = 1.55e+01 at line 257 field
> >3
> >
> >--
> >-------------------------
> >Daniel R. Roe, PhD
> >Department of Medicinal Chemistry
> >University of Utah
> >30 South 2000 East, Room 201
> >Salt Lake City, UT 84112-5820
> >http://home.chpc.utah.edu/~cheatham/
> >(801) 587-9652
> >(801) 585-6208 (Fax)
> >_______________________________________________
> >AMBER-Developers mailing list
> >AMBER-Developers.ambermd.org
> >http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed Feb 05 2014 - 21:00:03 PST
Custom Search