On Wed, 12 Nov 2008, David A. Case wrote:
> On Wed, Nov 12, 2008, Robert Duke wrote:
>
> > Multiple processor minimization is very standard; if it does not work, it
> > is a bug!!!
>
> True, but note that Ilyas' problem is with minimization coupled to
> thermodynamic integration, i.e. minimzation on a "mixed" potential surface.
> So this would not arise in pmemd, (although I am still hopeful that sometime
> I can convince you that putting TI into pmemd is a worthy goal!)
>
> Volodymyr has pointed out updates that may fix Ilyas' problem. I'm hoping
> that this does the trick.
Volodymyr,
I did not understand your comment; does this test case pass in amber9
you are using or not? In my machine (using mpich2) and in another cluster
(using openmpi), I am still getting the error message.
I first thought that it might be related to mpich2 but I dont have any
problem using 4/8 CPUs in pmemd and sander.MPI. When I have icfe
set in the input file, and try to do minimization with more than 2 CPUs, I
am getting the error. There is no error if it is a production run.
I dont mind using 2 CPUs in the minimization process, because it is pretty
fast. But I had this type of error message for a long time. I have
attached the test case to this email. I am wondering if it is only me
that is getting this error. The error I am getting is as follows. Thanks.
---------- error message ----------------
[yildirim.malatya02 ~/test]# runsim
Running multisander version of sander amber9
Total processors = 4
Number of groups = 2
Looping over processors:
WorldRank is the global PE rank
NodeID is the local PE rank in current group
Group = 0
WorldRank = 0
NodeID = 0
WorldRank = 1
NodeID = 1
WorldRank = 3
NodeID = 1
Group = 1
WorldRank = 2
NodeID = 0
[cli_1]: [cli_3]: aborting job:
Fatal error in MPI_Bcast: Message truncated, error stack:
MPI_Bcast(784).........................: MPI_Bcast(buf=0xe63888, count=1,
MPI_INTEGER, root=0, comm=0x84000000) failed
MPIR_Bcast(198)........................:
MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and tag 2
truncated; 144408 bytes received but buffer size is 4
aborting job:
Fatal error in MPI_Bcast: Message truncated, error stack:
MPI_Bcast(784).........................: MPI_Bcast(buf=0xe63888, count=1,
MPI_INTEGER, root=0, comm=0x84000000) failed
MPIR_Bcast(198)........................:
MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and tag 2
truncated; 144408 bytes received but buffer size is 4
rank 3 in job 3 malatya02_48011 caused collective abort of all ranks
exit status of rank 3: killed by signal 9
error: label not found.
[yildirim.malatya02 ~/test]#
-------------------
--
Ilyas Yildirim, Ph.D.
---------------------------------------------------------------
= Hutchison Hall B#10 - Department of Chemistry =
= - University of Rochester =
= 585-275-6766 (office) - =
= http://www.pas.rochester.edu/~yildirim/ =
---------------------------------------------------------------
Received on Fri Dec 05 2008 - 14:35:55 PST