Re: amber-developers: Extra Points Calculation

From: Ilyas Yildirim <yildirim.pas.rochester.edu>
Date: Wed, 12 Nov 2008 18:04:26 -0500 (EST)

On Wed, 12 Nov 2008, David A. Case wrote:

> On Wed, Nov 12, 2008, Robert Duke wrote:
>
> > Multiple processor minimization is very standard; if it does not work, it
> > is a bug!!!
>
> True, but note that Ilyas' problem is with minimization coupled to
> thermodynamic integration, i.e. minimzation on a "mixed" potential surface.
> So this would not arise in pmemd, (although I am still hopeful that sometime
> I can convince you that putting TI into pmemd is a worthy goal!)
>
> Volodymyr has pointed out updates that may fix Ilyas' problem. I'm hoping
> that this does the trick.

Volodymyr,

I did not understand your comment; does this test case pass in amber9
you are using or not? In my machine (using mpich2) and in another cluster
(using openmpi), I am still getting the error message.

I first thought that it might be related to mpich2 but I dont have any
problem using 4/8 CPUs in pmemd and sander.MPI. When I have icfe
set in the input file, and try to do minimization with more than 2 CPUs, I
am getting the error. There is no error if it is a production run.

I dont mind using 2 CPUs in the minimization process, because it is pretty
fast. But I had this type of error message for a long time. I have
attached the test case to this email. I am wondering if it is only me
that is getting this error. The error I am getting is as follows. Thanks.

---------- error message ----------------

[yildirim.malatya02 ~/test]# runsim

  Running multisander version of sander amber9
     Total processors = 4
     Number of groups = 2

     Looping over processors:
        WorldRank is the global PE rank
        NodeID is the local PE rank in current group

        Group = 0
        WorldRank = 0
        NodeID = 0

        WorldRank = 1
        NodeID = 1

        WorldRank = 3
        NodeID = 1

        Group = 1
        WorldRank = 2
        NodeID = 0

[cli_1]: [cli_3]: aborting job:
Fatal error in MPI_Bcast: Message truncated, error stack:
MPI_Bcast(784).........................: MPI_Bcast(buf=0xe63888, count=1,
MPI_INTEGER, root=0, comm=0x84000000) failed
MPIR_Bcast(198)........................:
MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and tag 2
truncated; 144408 bytes received but buffer size is 4
aborting job:
Fatal error in MPI_Bcast: Message truncated, error stack:
MPI_Bcast(784).........................: MPI_Bcast(buf=0xe63888, count=1,
MPI_INTEGER, root=0, comm=0x84000000) failed
MPIR_Bcast(198)........................:
MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and tag 2
truncated; 144408 bytes received but buffer size is 4
rank 3 in job 3 malatya02_48011 caused collective abort of all ranks
  exit status of rank 3: killed by signal 9
error: label not found.
[yildirim.malatya02 ~/test]#
-------------------

-- 
  Ilyas Yildirim, Ph.D.
  ---------------------------------------------------------------
  = Hutchison Hall B#10          - Department of Chemistry      =
  =                              - University of Rochester      =
  = 585-275-6766 (office)        -                              =
  = http://www.pas.rochester.edu/~yildirim/                     =
  ---------------------------------------------------------------

Received on Fri Dec 05 2008 - 14:35:55 PST
Custom Search