Re: amber-developers: 'MPI_BCAST : Message truncated' error

From: Ilyas Yildirim <yildirim.pas.rochester.edu>
Date: Tue, 6 Mar 2007 20:11:33 -0500 (EST)

Dear Scott,

I can run multisander in a production run without a problem (with 2/4/8...
cpu's). This problem happens when multisander is used in a minimization
(when Thermodynamic Integration Approach is followed). I tried to run it
in teragrid server, too, and the job did not start on calculating the
energies (I donno what the technical term is for this situation).

When a single structure (no TI Approach) is minimized with 4 cpu's (using
multisander), I dont get any error messages. I am just thinking that maybe
TI Approach does not let us use more than 2 cpu's in the minimization. I
can send the files to any of you to take a look at them. Thanks.

Best,

On Tue, 6 Mar 2007, Scott Brozell wrote:

> Hi,
>
> This smells like a multinode MPI issue.
> Here are the usual questions:
> What is your mpi implementation ?
> Have you tested your mpi implementation ?
> Have you tested your mpi implementation for multinode usage ?
> In particular, internode connectivity via rsh/ssh ?
> And proper ring setup for MPICH2 ?
> Try 2 total processors each on a different node ...
>
> Scott
>
> On Tue, 6 Mar 2007, Ilyas Yildirim wrote:
>
> > This is a system where I am following the Thermodynamic Integration
> > Approach. I have to use multisander. With 2 cpu's, everything is fine,
> > though.
> >
> > On Tue, 6 Mar 2007, Carlos Simmerling wrote:
> >
> > > does it work when you are not using multisander?
> > >
> > > On 3/6/07, Ilyas Yildirim <yildirim.pas.rochester.edu> wrote:
> > > > Dear All,
> > > >
> > > > Using sander.MPI in a minimization with 2 cpu's work fine, but if I try to
> > > > use 4/8/... cpu's, it is giving me the following error:
> > > >
> > > > ---------------------------------------------------------------------
> > > > arde00:/home/yildirim/test/l_0.2>runmin &
> > > > [1] 2428
> > > > arde00:/home/yildirim/test/l_0.2>/bin/rm: No match.
> > > > mpirun -stdin /dev/null -np 4 -nolocal -machinefile /tmp/tmp.mpi.2434
> > > > /home/yildirim/amber9/exe/sander.MPI -ng 2 -groupfile
> > > > /home/yildirim/test/l_0.2/groups_min1; rm -f /tmp/tmp.mpi.2434
> > > > running on arde11:1 arde12:1 arde13:2
> > > >
> > > > Running multisander version of sander amber9
> > > > Total processors = 4
> > > > Number of groups = 2
> > > >
> > > > Looping over processors:
> > > > WorldRank is the global PE rank
> > > > NodeID is the local PE rank in current group
> > > >
> > > > Group = 0
> > > > WorldRank = 0
> > > > NodeID = 0
> > > >
> > > > WorldRank = 1
> > > > NodeID = 1
> > > >
> > > > Group = 1
> > > > WorldRank = 2
> > > > NodeID = 0
> > > >
> > > > WorldRank = 3
> > > > NodeID = 1
> > > >
> > > > p3_19669: p4_error: : 14
> > > > 3 - MPI_BCAST : Message truncated
> > > > [3] Aborting program !
> > > > [3] Aborting program!
> > > > p1_7187: p4_error: : 14
> > > > 1 - MPI_BCAST : Message truncated
> > > > [1] Aborting program !
> > > > [1] Aborting program!
> > > > rm_l_3_19670: (2.024163) net_send: could not write to fd=5, errno = 32
> > > > rm_l_1_7188: (2.869610) net_send: could not write to fd=5, errno = 32
> > > > p2_12533: p4_error: net_recv read: probable EOF on socket: 1
> > > > rm_l_2_12534: (2.259215) net_send: could not write to fd=5, errno = 32
> > > > p1_7187: (2.871182) net_send: could not write to fd=5, errno = 32
> > > > p2_12533: (6.264409) net_send: could not write to fd=5, errno = 32
> > > > mpirun -stdin /dev/null -np 4 -nolocal -machinefile /tmp/tmp.mpi.2595
> > > > /home/yildirim/amber9/exe/sander.MPI -ng 2 -groupfile
> > > > /home/yildirim/test/l_0.2/groups_min2; rm -f /tmp/tmp.mpi.2595
> > > > ---------------------------------------------------------------------
> > > >
> > > > For the md runs, I dont see any problems (can run with 4/8/... cpu's). The
> > > > system is an 8-mer solvated with water. I was wondering if this is normal
> > > > for AMBER9, or if I am missing something. Thanks.
>
>

-- 
  Ilyas Yildirim
  ---------------------------------------------------------------
  - Department of Chemisty       -				-
  - University of Rochester      -				-
  - Hutchison Hall, # B10        -				-
  - Rochester, NY 14627-0216     - Ph.:(585) 275 67 66 (Office)	-
  - http://www.pas.rochester.edu/~yildirim/			-
  ---------------------------------------------------------------
Received on Wed Mar 07 2007 - 06:07:47 PST
Custom Search