Re: [AMBER-Developers] dreaded unspecified launch failures

From: Scott Le Grand <varelse2005.gmail.com>
Date: Fri, 22 Mar 2013 11:20:08 -0700

And here's the problem:
static __constant__ const PMEDouble
PI = (PMEDouble) PI_VAL;
static __constant__ const PMEFloat
PI_F = (PMEFloat) PI_VAL;


This makes these values unusable from CPU code.

Why were they changed from "const" to "__constant__"? This is not of my
doing.

#define PI_VAL 3.1415926535897932384626433832795
static const PMEDouble PI =
(PMEDouble) PI_VAL;
static const PMEFloat PI_F =
(PMEFloat) PI_VAL;

On Fri, Mar 22, 2013 at 11:15 AM, Scott Le Grand <varelse2005.gmail.com>wrote:

> The constant PI is zero somehow in the main tree. This is some sort of
> bizarro linker issue. It is not that way in mine. All else follows from
> this.
>
>
>
>
> On Fri, Mar 22, 2013 at 10:56 AM, Scott Le Grand <varelse2005.gmail.com>wrote:
>
>> I see where it's dying - the Ewald sum in NTP - I'm bewildered as to why
>> right now...
>>
>>
>>
>> On Fri, Mar 22, 2013 at 9:39 AM, Scott Le Grand <varelse2005.gmail.com>wrote:
>>
>>> I see it too, but my work tree doesn't repro.
>>>
>>> Will try to figure out what's different.
>>>
>>>
>>>
>>> On Fri, Mar 22, 2013 at 9:32 AM, David A Case <case.biomaps.rutgers.edu>wrote:
>>>
>>>> 1. I'm getting the dreaded lauch failures on recent cuda builds, with
>>>> the
>>>> test suite. This is with the latest git repo, configured with "-cuda
>>>> gnu"
>>>> and:
>>>>
>>>> casegroup1% nvcc --version
>>>> nvcc: NVIDIA (R) Cuda compiler driver
>>>> Copyright (c) 2005-2012 NVIDIA Corporation
>>>> Built on Thu_Apr__5_00:24:31_PDT_2012
>>>> Cuda compilation tools, release 4.2, V0.2.1221
>>>>
>>>> There were about 8 launch failures, the first in the the
>>>> large_solute_count
>>>> or dhfr directories, with ntb2 generally set (so I'm thinking it is
>>>> likely to
>>>> be a problem with the new barostat codes, since all these errors occur
>>>> in
>>>> constant pressure runs.)
>>>>
>>>> 2. If I rewind to commit 511ef9f0c8227e706c1be from March 8, all these
>>>> go
>>>> away. (There are 10 minor diffs, which I'm assuming have something to
>>>> do with
>>>> using a different GPU than the saved tests results.)
>>>>
>>>> 3. Cruise control is not a lot of help here. For one thing, the
>>>> "Test_cuda_parallel_gnu-4.4.6" build is actually not testing the cuda
>>>> code
>>>> at all, but appears to be running general Amber tests (not cuda tests).
>>>>
>>>> The "test_cuda_serial_gnu-4.4.6" build just says "352 tests experienced
>>>> errors!", but the history doesn't (seem to?) go back far enough to
>>>> figure
>>>> out when things broke. There is a graph of testing times (which might
>>>> be
>>>> helpful), but the x-axis has no labels on it.
>>>>
>>>> The "test_cuda_parallel_intel-11.1.069" build doesn't help, since all
>>>> the
>>>> tests just say "no CUDA-capable device is detected"
>>>>
>>>> 5. Is anyone else seeing problems with cuda and the latest git? I can
>>>> give
>>>> more details than I did in part 1, above, but my guess is that everyone
>>>> is
>>>> likely to be seeing the same thing, by compiling with gnu and running
>>>> the
>>>> tests. But if it works for others, then I will know that my case is
>>>> special,
>>>> and I can spend more time trying to figure out the problem.
>>>>
>>>> 5. I'm wondering if we just have way too many things going on in cruise
>>>> control(?) If lots of things are broken, maybe it gets too hard to fix
>>>> and
>>>> people get discouraged. Should we consider scaling back to a few
>>>> tests, then
>>>> slowly adding more as we get things to work?
>>>>
>>>> ...thx...dac
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER-Developers mailing list
>>>> AMBER-Developers.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>>
>>>
>>>
>>
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 22 2013 - 11:30:04 PDT
Custom Search