Re: [AMBER-Developers] dreaded unspecified launch failures

From: Scott Le Grand <varelse2005.gmail.com>
Date: Fri, 22 Mar 2013 10:56:57 -0700

I see where it's dying - the Ewald sum in NTP - I'm bewildered as to why
right now...


On Fri, Mar 22, 2013 at 9:39 AM, Scott Le Grand <varelse2005.gmail.com>wrote:

> I see it too, but my work tree doesn't repro.
>
> Will try to figure out what's different.
>
>
>
> On Fri, Mar 22, 2013 at 9:32 AM, David A Case <case.biomaps.rutgers.edu>wrote:
>
>> 1. I'm getting the dreaded lauch failures on recent cuda builds, with the
>> test suite. This is with the latest git repo, configured with "-cuda gnu"
>> and:
>>
>> casegroup1% nvcc --version
>> nvcc: NVIDIA (R) Cuda compiler driver
>> Copyright (c) 2005-2012 NVIDIA Corporation
>> Built on Thu_Apr__5_00:24:31_PDT_2012
>> Cuda compilation tools, release 4.2, V0.2.1221
>>
>> There were about 8 launch failures, the first in the the
>> large_solute_count
>> or dhfr directories, with ntb2 generally set (so I'm thinking it is
>> likely to
>> be a problem with the new barostat codes, since all these errors occur in
>> constant pressure runs.)
>>
>> 2. If I rewind to commit 511ef9f0c8227e706c1be from March 8, all these go
>> away. (There are 10 minor diffs, which I'm assuming have something to do
>> with
>> using a different GPU than the saved tests results.)
>>
>> 3. Cruise control is not a lot of help here. For one thing, the
>> "Test_cuda_parallel_gnu-4.4.6" build is actually not testing the cuda code
>> at all, but appears to be running general Amber tests (not cuda tests).
>>
>> The "test_cuda_serial_gnu-4.4.6" build just says "352 tests experienced
>> errors!", but the history doesn't (seem to?) go back far enough to figure
>> out when things broke. There is a graph of testing times (which might be
>> helpful), but the x-axis has no labels on it.
>>
>> The "test_cuda_parallel_intel-11.1.069" build doesn't help, since all the
>> tests just say "no CUDA-capable device is detected"
>>
>> 5. Is anyone else seeing problems with cuda and the latest git? I can
>> give
>> more details than I did in part 1, above, but my guess is that everyone is
>> likely to be seeing the same thing, by compiling with gnu and running the
>> tests. But if it works for others, then I will know that my case is
>> special,
>> and I can spend more time trying to figure out the problem.
>>
>> 5. I'm wondering if we just have way too many things going on in cruise
>> control(?) If lots of things are broken, maybe it gets too hard to fix
>> and
>> people get discouraged. Should we consider scaling back to a few tests,
>> then
>> slowly adding more as we get things to work?
>>
>> ...thx...dac
>>
>>
>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 22 2013 - 11:00:02 PDT
Custom Search