Hi Dave,
I'm just learning my way around MPI myself, so please don't take this advice as expert. But I hope it's better than the blind leading the blind into a ditch.
On 14/01/2012, at 2:21 PM, dcerutti.rci.rutgers.edu wrote:
> Hello, looking for a developer who can help me understand what an MPI communicator really is, once I create it. I thought that I had this down, after going through the various LLNL and Argon Labs tutorials.
>
> 1.) Create a new group (of threads) by first referencing the group associated with MPI_COMM_WORLD, and then including a subset of the processes in MPI_COMM_WORLD in a new group. Let's call the reference to the group of MPI_COMM_WORLD gworld and the new group cpugrp.
>
> 2.) Create a new communicator associated with cpugrp. Let's call the new communicator newcomm.
>
> That leads to the following code:
>
> MPI_Comm_group(MPI_COMM_WORLD, &gworld);
> MPI_Group_incl(gworld, ncpu, CPUlist, &cpugrp);
> MPI_Comm_create(MPI_COMM_WORLD, cpugrp, &newcomm);
>
> where ncpu is the number of threads that I've decided need to be part of the new thread group and CPUlist contains the ranks of those threads within MPI_COMM_WORLD.
>
> Now, my confusion comes about because I want to define a new communicator for each process to include precisely those other processes that it sends information to during the direct space nonbonded force calculation. It would seem that if I have the above code run on each of, say, four threads, I should get four new communicators.
>From the MPI book I'm consulting, it seems you'd actually get one communicator.
> That's not what I think I'm seeing. If I have each thread print out the (integer) handle to the
> communicator that it has created, it's thread ID, and the list of threads that it has included in the new communicator, I get the following:
>
> Newcomm -2080374780 on process 0 with IDs [ 0 1];
> Newcomm -2080374782 on process 1 with IDs [ 1 2 3 0];
> Newcomm -2080374782 on process 2 with IDs [ 2 3 1];
> Newcomm -2080374782 on process 3 with IDs [ 3 0 1 2];
>
> Notice, there are only two unique handles there. Why is that? What really confuses me is what happens to the other processes that may be included in each communicator. If process 2 creates a communicator X that includes processes 2, 3, and 1, do processes 1 and 3 then know that they are part of X? I'm broadcasting the handles to each communicator by MPI_Allgather() so that every process is carrying an array of handles to the various communicators--is it OK to communicate them as MPI_INT?--but I'm not sure what happens because each communicator is more than just its integer handle. Ultimately, I want to be passing messages around like have process 3 send to process 1 over communicator X. The big reason I'm doing this is to restrict the talking that has to be done "all in one room" to prevent messages from being misidentified or running out of tag numbers that I can guarantee to be unique.
Again, the book I'm reading suggests that MPI_Comm_create must be called with the same arguments on all processes in the original communicator (MPI_COMM_WORLD in your example), even those that aren't going to be part of the new communicator (&newcomm in your example). Furthermore, whatever new communicators are created by the processes, they must all be created in the same order, so that this is verboten:
(on process 1)
MPI_Comm_create(args…,&newcomm1);
MPI_Comm_create(args…,&newcomm2);
(on process 2)
MPI_Comm_create(args…,&newcomm2);
MPI_Comm_create(args…,&newcomm1);
I'm also not sure whether you would need the handles for any purpose. If you have run MPI_Comm_create on all the relevant processes, as seems to be required, each of those processes should be able to refer to the communicator directly, as far as I can make out.
Let me know how you get on, though; I'm interested in learning some of this stuff myself!
B.
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Jan 14 2012 - 20:30:02 PST