Re: [AMBER-Developers] near-final testing

From: Tyler Luchko <tluchko.rci.rutgers.edu>
Date: Tue, 13 Apr 2010 19:10:50 -0400

On 2010-04-13, at 1:16 PM, Tyler Luchko wrote:

>
> On 2010-04-13, at 11:51 AM, Jason Swails wrote:
>
>> On Tue, Apr 13, 2010 at 11:44 AM, Mark Williamson <mjw.sdsc.edu> wrote:
>>> case wrote:
>>>>
>>>> On Tue, Apr 13, 2010, Jason Swails wrote:
>>>>>>
>>>>>> 2. Jason: For Mac OSX10.6, gnu 4.43: what is the nature of the failure
>>>>>> in parallel sander.RISM.MPI? What is your stack size (that causes the
>>>>>> parallel pmemd.MPI jobs to fail)? What do you mean by "memory overlap"?
>>>>>
>>>>> The errors I was getting are the same as the ones I reported for the
>>>>> Ubuntu build on the wiki (I posted sample error messages there for
>>>>> both the RISM and pmemd errors). I'm beginning to think it may be a
>>>>> gnu 4.4-related problem (since that is the only commonality that my
>>>>> systems share, even though they are different 4.4's).
>>>>
>>>> Is it possibly an MPI problem? What version of MPI are you using on the
>>>> MAC
>>>> (I see mpich2 on ubuntu). Have you ever tried openmpi using the configure
>>>> script we provide?
>>>>
>>>
>>> Ok, Ross and I have noticed this one too:
>>>
>>> ./Run.gbrna
>>> if ( ! 1 ) set TESTsander = ../../exe/sander
>>> if ( ! 1 ) then
>>> set numprocs=`echo $DO_PARALLEL | awk -f ../numprocs.awk `
>>> echo mpirun -np 2
>>> awk -f ../numprocs.awk
>>> if ( 2 > 19 ) then
>>> if ( 0 ) then
>>> endif
>>> endif
>>> cat
>>> set output = mdout.gbrna
>>> mpirun -np 2 ../../exe/pmemd.MPI -O -i gbin -c md4.x -o mdout.gbrna
>>> Assertion failed in file helper_fns.c at line 335: 0
>>> memcpy argument memory ranges overlap, dst_=0xe812e0 src_=0xe812e0 len_=7680
>>>
>>> internal ABORT - process 0
>>> rank 0 in job 387 caffeine.sdsc.edu_43871 caused collective abort of all
>>> ranks
>>> exit status of rank 0: killed by signal 9
>>> goto error
>>> echo ./Run.gbrna: Program error
>>> ./Run.gbrna: Program error
>>> exit ( 1 )
>>>
>>> /server-home/netbin/mpi/mpich2-1.2.1p1-ifort-10.1.018/
>>>
>>>
>>> We think it is something to do with the latest version of mpich2, were are
>>> actively investigating. Jason, what version of mpich2 are you using?
>>
>> 1.2.1 -- I hadn't thought to check the MPI implementations, as I
>> generally always used OpenMPI on my Mac, but for reasons of
>> convenience I used MPICH2 when I recently reconfigured my system. I'm
>> working on verifying everything works with OpenMPI right now on
>> Ubuntu, and if that works I'll move it over to my Mac.
>>
>
> On CentOS with GCC 4.1.2 with MPICH 1.2.1p1, two of the sander.RISM.MPI tests fail and two pass. With OpenMPI all sander.RISM.MPI tests pass on CentOS with Gnu, PGI and Intel compilers and on Mac 10.6 with Gnu 4.4.
>
> I will look into the RISM/MPICH issues this afternoon.
>

The attached patch should fix the problem. I'm surprised that this only shows up with MPICH. Basically, I was passing 0 instead of 0d0 to a BLAS subroutine.

BTW, if anyone is willing/interested in testing RISM in NAB with MPICH the library to link to is libfmpich.a:

export XTRA_FLIBS=-lfmpich

and then

./configure -mpi -rismmpi compiler

Tyler


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Tue Apr 13 2010 - 16:30:02 PDT
Custom Search