[AMBER-Developers] Issues with MPICH-1.2.1p1

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 13 Apr 2010 16:12:01 -0700

Hi All,

Mark and I have been looking at the problems occurring with MPICH-1.2.1p1
that give the following error with PMEMD:

Assertion failed in file helper_fns.c at line 335: 0
memcpy argument memory ranges overlap, dst_=0x6e51a4 src_=0x6e51a0 len_=100

I believe a similar issue may be occurring with MPI RISM. This, in my
opinion, looks like an overzealous interpretation of the MPI 1 standard by
the mpich2 authors. Note, earlier versions, such as mpich2-1.0.7 work fine.

The problem is occurring in the use of mpi_gatherv in PMEMD gb_parallel.fpp
which uses the same send and receive buffer for the gatherv call. Looking at
the standard this should be perfectly reasonable and indeed works with EVERY
other MPI I have ever tried. However, we probably want to address this ASAP
before users start complaining. Attached is an example program that
reproduces this problem.

Specifically one has the following 3 options:

1) !Send each tasks chunk of send array to the receive array on the master
  call mpi_gatherv(send_array(my_array_offset), my_array_count, MPI_INTEGER,
rec_array, rec_counts, rec_offsets, MPI_INTEGER, 0, mpi_comm_world, ierr)

This works for everything.

2) !What this 'should' be according to the MPICH2 people:
if (mytaskid==0) then
  call mpi_gatherv(MPI_IN_PLACE, my_array_count, MPI_INTEGER, send_array,
rec_counts, rec_offsets, MPI_INTEGER, 0, mpi_comm_world, ierr)
else
  call mpi_gatherv(send_array(my_array_offset), my_array_count, MPI_INTEGER,
send_array, rec_counts, rec_offsets, MPI_INTEGER, 0, mpi_comm_world, ierr)
end if

Note this works ONLY with MPI v2.

3) !Send each tasks chunk of send array to the send array on the master
  call mpi_gatherv(send_array(my_array_offset), my_array_count, MPI_INTEGER,
send_array, rec_counts, rec_offsets, MPI_INTEGER, 0, mpi_comm_world, ierr)

This is what we use right now that works with all previous MPICH2's plus
every other MPI implementation I have tried.

Suggestions for how we want to address this? We can have a -DMPI2 which
means we have to update all the configure rules to either detect this or
rely on the user to specify it.

Or we rewrite this section of PMEMD (plus other places which may cause
issues such as in the non power of 2 cpu code in sander + RISM?) to use
different buffers and do a copy afterwards.

Or we just tell people not to use mpich2 v1.2.1p1 (and probably later
versions).

All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.





_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Tue Apr 13 2010 - 16:30:03 PDT
Custom Search