Re: [AMBER-Developers] MPI Communicator creation

From: Jason Swails <jason.swails.gmail.com>
Date: Sun, 15 Jan 2012 23:08:22 -0500

You're going to kick yourself ;). Here's your problematic code:

  if (world_rank == 1) {
    MPI_Recv(myints, world_size, MPI_INT, 2, 475, world_comm, &req[0]);
  }
  if (world_rank == 3) {
    MPI_Send(myints, world_size, MPI_INT, 0, 475, world_comm);
  }

If you change that to

  if (world_rank == 0) {
    MPI_Recv(myints, world_size, MPI_INT, 2, 475, world_comm, &req[0]);
  }
  if (world_rank == 2) {
    MPI_Send(myints, world_size, MPI_INT, 0, 475, world_comm);
  }


which it should be, everything works as you'd expect it to.

Good luck,
Jason

On Sun, Jan 15, 2012 at 9:55 PM, <dcerutti.rci.rutgers.edu> wrote:

> Well, shucks. Looks like I still don't understand this. At least, I can
> better define the problem, but I'm still not sure why I'm getting it.
> I've devised a scheme within mdgx to keep messages from getting crossed in
> TI and (in the future) replica exchange. The message tag numbering seems
> to be able to support what I had, which means that the tags are getting
> assigned properly between senders and receivers, and they're all unique
> with room for as many as 200,000,000 CPUs. But when I try to fold in
> unique communicators for each cell grid (each of the end points in TI, or
> each replica, gets its own cell grid) the code stalls.
>
> The attached toy code illustrates the problem. It's an enhanced version
> of the example program at
> http://mpi.deino.net/mpi_functions/MPI_Comm_create.html. The output is
> the following:
>
> Created com -2080374777 on process 0
> Created com -2080374780 on process 1
> Created com -2080374780 on process 2
> Created com -2080374780 on process 3
>
> (This shows the initial contents of junk integer arrays on each process)
>
> myints on 0: [ 0 0 0 0];
> myints on 1: [ 1 1 1 1];
> myints on 2: [ 2 2 2 2];
> myints on 3: [ 3 3 3 3];
>
> (This is where the message from process 3 -> 1 gets passed)
>
> myints on 0: [ 0 0 0 0];
> myints on 1: [ 3 3 3 3];
> myints on 2: [ 2 2 2 2];
> myints on 3: [ 3 3 3 3];
>
> (This is where the code stalls)
>
> So, the message from process 3 gets passed to 1 successfully, because they
> both agree on the handle to the new communicator world_comm. But, when I
> ty to pass a message from process 2 to 0 over world_comm, those two
> processes don't agree on the handle so it never gets through. What I
> don't understand is that I've passed the same inputs to all threads and
> created world_comm at the same time. Why do three (in fact, all but the
> master process) agree on the handle to world_comm whereas the master
> process itself does not?
>
> If anyone can explain that to me, I think I'm off and running with
> communicators.
>
> Dave
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>


-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Jan 15 2012 - 20:30:03 PST
Custom Search