Re: [AMBER-Developers] Problems with pmemd.cuda.MPI

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Thu, 13 Mar 2014 14:41:25 -0600

I can confirm this also happens on our local cluster running Tesla M2090s,
gcc 4.4.7, mvapich 1.9, cuda 5.5.

-Dan


On Thu, Mar 13, 2014 at 1:57 PM, Daniel Roe <daniel.r.roe.gmail.com> wrote:

> Here are the test output files corresponding to the test .dif files. Let
> me know if you need more data. I'll test on our local cluster in the
> meantime.
>
> -Dan
>
>
> On Thu, Mar 13, 2014 at 1:12 PM, Daniel Roe <daniel.r.roe.gmail.com>wrote:
>
>> OK, I'll get them to you ASAP. Unfortunately stampede appears to have
>> gone (or is going) offline. I'll try and reproduce the test failures on one
>> of our local clusters.
>>
>> -Dan
>>
>>
>> On Thu, Mar 13, 2014 at 11:57 AM, Scott Le Grand <varelse2005.gmail.com>wrote:
>>
>>> I need your raw mdout files for the failed tests. The comparisons on
>>> their
>>> own are mostly useless.
>>>
>>>
>>>
>>> On Thu, Mar 13, 2014 at 10:52 AM, Daniel Roe <daniel.r.roe.gmail.com>
>>> wrote:
>>>
>>> > The reason I initially used intel compilers is they are the default on
>>> > stampede (and other HPC centers as well). However, the issue exists
>>> with
>>> > GNU compilers as well, version 4.4.6, mvapich 1.9, cuda 5.0 (diffs and
>>> log
>>> > attached). The differences are similar (though not 100% exactly) to
>>> those
>>> > seen with the intel compilers.
>>> >
>>> > Let me know if you need any more information.
>>> >
>>> > -Dan
>>> >
>>> >
>>> > On Thu, Mar 13, 2014 at 10:39 AM, Scott Le Grand <
>>> varelse2005.gmail.com
>>> > >wrote:
>>> >
>>> > > Why do you guys bother with Intel's compilers for the CUDA edition?
>>> I
>>> > > can't even get my hands on them without paying the big bucks so
>>> there's
>>> > > zero incentive for me to debug Intel compiler issues other than
>>> saying
>>> > > don't use them. That said, if it's broken with gcc, then it's
>>> > interesting.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Mar 13, 2014 at 9:24 AM, Daniel Roe <daniel.r.roe.gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hi All,
>>> > > >
>>> > > > First, this time I promise I am using an up-to-date GIT tree:
>>> > > >
>>> > > > commit 8f81fff98e4d3095d0c66070eb3375f6a72708c0
>>> > > > Merge: 4de7ced 72c975d
>>> > > > Author: Pawel Janowski <pjanowsk.eden.rutgers.edu>
>>> > > > Date: Thu Mar 13 09:55:44 2014 -0400
>>> > > >
>>> > > > I am running on stampede (Tesla K20m) using intel compilers 13.1.0,
>>> > cuda
>>> > > > 5.0, and mvapich 1.9. For pmemd.cuda.MPI the PME tests go haywire
>>> > > (absolute
>>> > > > error as high as 1.31e+06!!). No segfaults though. Diffs and log
>>> > > attached.
>>> > > > Going back to commit d8024087a4d8c4c1e801192839df57d760bcadd2 (Wed
>>> Feb
>>> > 5
>>> > > > 22:17:48 2014 -0500) fixes the problems, although I have not
>>> > > systematically
>>> > > > gone back to see at what commit the code breaks. The problems
>>> happen
>>> > even
>>> > > > if I run on just 1 thread. GB seems OK, and pmemd.cuda seems OK as
>>> > well.
>>> > > >
>>> > > > Has anyone else seen issues like these? I will try using GNU
>>> compilers
>>> > > next
>>> > > > to see if the issue happens with them as well.
>>> > > >
>>> > > > -Dan
>>> > > >
>>> > > > --
>>> > > > -------------------------
>>> > > > Daniel R. Roe, PhD
>>> > > > Department of Medicinal Chemistry
>>> > > > University of Utah
>>> > > > 30 South 2000 East, Room 201
>>> > > > Salt Lake City, UT 84112-5820
>>> > > > http://home.chpc.utah.edu/~cheatham/
>>> > > > (801) 587-9652
>>> > > > (801) 585-6208 (Fax)
>>> > > >
>>> > > > _______________________________________________
>>> > > > AMBER-Developers mailing list
>>> > > > AMBER-Developers.ambermd.org
>>> > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
>>> > > >
>>> > > >
>>> > > _______________________________________________
>>> > > AMBER-Developers mailing list
>>> > > AMBER-Developers.ambermd.org
>>> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > -------------------------
>>> > Daniel R. Roe, PhD
>>> > Department of Medicinal Chemistry
>>> > University of Utah
>>> > 30 South 2000 East, Room 201
>>> > Salt Lake City, UT 84112-5820
>>> > http://home.chpc.utah.edu/~cheatham/
>>> > (801) 587-9652
>>> > (801) 585-6208 (Fax)
>>> >
>>> > _______________________________________________
>>> > AMBER-Developers mailing list
>>> > AMBER-Developers.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>>> >
>>> >
>>> _______________________________________________
>>> AMBER-Developers mailing list
>>> AMBER-Developers.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>
>>
>>
>>
>> --
>> -------------------------
>> Daniel R. Roe, PhD
>> Department of Medicinal Chemistry
>> University of Utah
>> 30 South 2000 East, Room 201
>> Salt Lake City, UT 84112-5820
>> http://home.chpc.utah.edu/~cheatham/
>> (801) 587-9652
>> (801) 585-6208 (Fax)
>>
>
>
>
> --
> -------------------------
> Daniel R. Roe, PhD
> Department of Medicinal Chemistry
> University of Utah
> 30 South 2000 East, Room 201
> Salt Lake City, UT 84112-5820
> http://home.chpc.utah.edu/~cheatham/
> (801) 587-9652
> (801) 585-6208 (Fax)
>



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Mar 13 2014 - 14:00:02 PDT
Custom Search