Re: [AMBER-Developers] dreaded unspecified launch failures

From: Jason Swails <jason.swails.gmail.com>
Date: Fri, 22 Mar 2013 14:39:39 -0400

This was changed back in August of 2011 (git blame is a wonderful tool ;)).

Did one of your recent fixes for NTP use PI for the first time on the CPU?
 That would explain why it only just started failing, maybe...

On Fri, Mar 22, 2013 at 2:20 PM, Scott Le Grand <varelse2005.gmail.com>wrote:

> And here's the problem:
> static __constant__ const PMEDouble
> PI = (PMEDouble) PI_VAL;
> static __constant__ const PMEFloat
> PI_F = (PMEFloat) PI_VAL;
>
>
> This makes these values unusable from CPU code.
>
> Why were they changed from "const" to "__constant__"? This is not of my
> doing.
>
> #define PI_VAL 3.1415926535897932384626433832795
> static const PMEDouble PI =
> (PMEDouble) PI_VAL;
> static const PMEFloat PI_F =
> (PMEFloat) PI_VAL;
>
> On Fri, Mar 22, 2013 at 11:15 AM, Scott Le Grand <varelse2005.gmail.com
> >wrote:
>
> > The constant PI is zero somehow in the main tree. This is some sort of
> > bizarro linker issue. It is not that way in mine. All else follows from
> > this.
> >
> >
> >
> >
> > On Fri, Mar 22, 2013 at 10:56 AM, Scott Le Grand <varelse2005.gmail.com
> >wrote:
> >
> >> I see where it's dying - the Ewald sum in NTP - I'm bewildered as to why
> >> right now...
> >>
> >>
> >>
> >> On Fri, Mar 22, 2013 at 9:39 AM, Scott Le Grand <varelse2005.gmail.com
> >wrote:
> >>
> >>> I see it too, but my work tree doesn't repro.
> >>>
> >>> Will try to figure out what's different.
> >>>
> >>>
> >>>
> >>> On Fri, Mar 22, 2013 at 9:32 AM, David A Case <
> case.biomaps.rutgers.edu>wrote:
> >>>
> >>>> 1. I'm getting the dreaded lauch failures on recent cuda builds, with
> >>>> the
> >>>> test suite. This is with the latest git repo, configured with "-cuda
> >>>> gnu"
> >>>> and:
> >>>>
> >>>> casegroup1% nvcc --version
> >>>> nvcc: NVIDIA (R) Cuda compiler driver
> >>>> Copyright (c) 2005-2012 NVIDIA Corporation
> >>>> Built on Thu_Apr__5_00:24:31_PDT_2012
> >>>> Cuda compilation tools, release 4.2, V0.2.1221
> >>>>
> >>>> There were about 8 launch failures, the first in the the
> >>>> large_solute_count
> >>>> or dhfr directories, with ntb2 generally set (so I'm thinking it is
> >>>> likely to
> >>>> be a problem with the new barostat codes, since all these errors occur
> >>>> in
> >>>> constant pressure runs.)
> >>>>
> >>>> 2. If I rewind to commit 511ef9f0c8227e706c1be from March 8, all
> these
> >>>> go
> >>>> away. (There are 10 minor diffs, which I'm assuming have something to
> >>>> do with
> >>>> using a different GPU than the saved tests results.)
> >>>>
> >>>> 3. Cruise control is not a lot of help here. For one thing, the
> >>>> "Test_cuda_parallel_gnu-4.4.6" build is actually not testing the cuda
> >>>> code
> >>>> at all, but appears to be running general Amber tests (not cuda
> tests).
> >>>>
> >>>> The "test_cuda_serial_gnu-4.4.6" build just says "352 tests
> experienced
> >>>> errors!", but the history doesn't (seem to?) go back far enough to
> >>>> figure
> >>>> out when things broke. There is a graph of testing times (which might
> >>>> be
> >>>> helpful), but the x-axis has no labels on it.
> >>>>
> >>>> The "test_cuda_parallel_intel-11.1.069" build doesn't help, since all
> >>>> the
> >>>> tests just say "no CUDA-capable device is detected"
> >>>>
> >>>> 5. Is anyone else seeing problems with cuda and the latest git? I can
> >>>> give
> >>>> more details than I did in part 1, above, but my guess is that
> everyone
> >>>> is
> >>>> likely to be seeing the same thing, by compiling with gnu and running
> >>>> the
> >>>> tests. But if it works for others, then I will know that my case is
> >>>> special,
> >>>> and I can spend more time trying to figure out the problem.
> >>>>
> >>>> 5. I'm wondering if we just have way too many things going on in
> cruise
> >>>> control(?) If lots of things are broken, maybe it gets too hard to
> fix
> >>>> and
> >>>> people get discouraged. Should we consider scaling back to a few
> >>>> tests, then
> >>>> slowly adding more as we get things to work?
> >>>>
> >>>> ...thx...dac
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> AMBER-Developers mailing list
> >>>> AMBER-Developers.ambermd.org
> >>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> >>>>
> >>>
> >>>
> >>
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>



-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 22 2013 - 12:00:03 PDT
Custom Search