Re: [AMBER-Developers] dreaded unspecified launch failures from Scott Le Grand on 2013-03-22 (Amber Developers Archive Mar 2013)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Fri, 22 Mar 2013 09:39:51 -0700

I see it too, but my work tree doesn't repro.

Will try to figure out what's different.

On Fri, Mar 22, 2013 at 9:32 AM, David A Case <case.biomaps.rutgers.edu>wrote:

> 1. I'm getting the dreaded lauch failures on recent cuda builds, with the
> test suite. This is with the latest git repo, configured with "-cuda gnu"
> and:
>
> casegroup1% nvcc --version
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2012 NVIDIA Corporation
> Built on Thu_Apr__5_00:24:31_PDT_2012
> Cuda compilation tools, release 4.2, V0.2.1221
>
> There were about 8 launch failures, the first in the the large_solute_count
> or dhfr directories, with ntb2 generally set (so I'm thinking it is likely
> to
> be a problem with the new barostat codes, since all these errors occur in
> constant pressure runs.)
>
> 2. If I rewind to commit 511ef9f0c8227e706c1be from March 8, all these go
> away. (There are 10 minor diffs, which I'm assuming have something to do
> with
> using a different GPU than the saved tests results.)
>
> 3. Cruise control is not a lot of help here. For one thing, the
> "Test_cuda_parallel_gnu-4.4.6" build is actually not testing the cuda code
> at all, but appears to be running general Amber tests (not cuda tests).
>
> The "test_cuda_serial_gnu-4.4.6" build just says "352 tests experienced
> errors!", but the history doesn't (seem to?) go back far enough to figure
> out when things broke. There is a graph of testing times (which might be
> helpful), but the x-axis has no labels on it.
>
> The "test_cuda_parallel_intel-11.1.069" build doesn't help, since all the
> tests just say "no CUDA-capable device is detected"
>
> 5. Is anyone else seeing problems with cuda and the latest git? I can give
> more details than I did in part 1, above, but my guess is that everyone is
> likely to be seeing the same thing, by compiling with gnu and running the
> tests. But if it works for others, then I will know that my case is
> special,
> and I can spend more time trying to figure out the problem.
>
> 5. I'm wondering if we just have way too many things going on in cruise
> control(?) If lots of things are broken, maybe it gets too hard to fix and
> people get discouraged. Should we consider scaling back to a few tests,
> then
> slowly adding more as we get things to work?
>
> ...thx...dac
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 22 2013 - 10:00:04 PDT