Re: [AMBER-Developers] problem in running pmemd.MPI

From: InSuk Joung <i.joung.gmail.com>
Date: Thu, 25 Oct 2012 15:20:13 -0400

Jason came to me and he spotted the code which causes the problem.

(idb) backtrace
#0 0x00002ab837157966 in MPIDI_CH3I_SMP_pull_header () in
/opt/mvapich2/intel/ib/lib/libmpich.so.3.3
#1 0x00002ab8371572a5 in MPIDI_CH3I_SMP_read_progress () in
/opt/mvapich2/intel/ib/lib/libmpich.so.3.3
#2 0x00002ab837150bfe in MPIDI_CH3I_Progress () in
/opt/mvapich2/intel/ib/lib/libmpich.so.3.3
#3 0x00002ab83724825d in PMPI_Waitany () in
/opt/mvapich2/intel/ib/lib/libmpich.so.3.3
#4 0x00002ab8372483aa in mpi_waitany_ () in
/opt/mvapich2/intel/ib/lib/libmpich.so.3.3
#5 0x000000000044cacc in PARALLEL_MOD::pvt_get_img_frc_distribution
(atm_cnt=7132, off_tbl=(...), recv_taskmap=(...), send_taskmap=(...),
send_atm_lst=(...), send_atm_cnts=(...), recv_atm_lsts=(...),
recv_atm_cnts=(...), owned_atm_cnts=(...)) at
/home/isjoung/AMBER/src/pmemd/src/parallel.F90:932
#6 0x000000000044c27e in PARALLEL_MOD::get_img_frc_distribution
(atm_cnt=7132) at /home/isjoung/AMBER/src/pmemd/src/parallel.F90:742
#7 0x00000000004fe7b5 in PME_FORCE_MOD::pme_force (atm_cnt=7132,
crd=(...), frc=(...), img_atm_map=(...), atm_img_map=(...),
my_atm_lst=(...), new_list=.TRUE., need_pot_enes=.FALSE.,
need_virials=.TRUE., pot_ene= (...), virial=(...), ekcmt=(...),
pme_err_est=0) at /home/isjoung/AMBER/src/pmemd/src/pme_force.F90:814
#8 0x0000000000547b59 in RUNMD_MOD::runmd (atm_cnt=7132, crd=(...),
mass=(...), frc=(...), vel=(...), last_vel=(...), my_atm_lst=(...),
local_remd_method=0, local_numexchg=1) at
/home/isjoung/AMBER/src/pmemd/src/runmd.F90:828
#9 0x00000000005b4c70 in pmemd () at
/home/isjoung/AMBER/src/pmemd/src/pmemd.F90:366


On Thu, Oct 25, 2012 at 2:43 PM, InSuk Joung <i.joung.gmail.com> wrote:

> It is the dev master branch checked out a few days ago.
> The same system works without any problem on my local machine and the
> compiler was intel 11.1/mpich2 1.4.1p1. So, it is probably related to the
> MPI library.
> It may not be a coordinate problem. I used sander.MPI, then it worked ok.
> Whenever I switch back to pmemd.MPI, the same error occurs.
> It is also interesting that a very similar system but with a slight size
> difference (a little more or less water/ions) works ok. I guess it is a bad
> combination of this specific prmtop file and the MPI library.
>
> I will try openmpi.
>
> On Thu, Oct 25, 2012 at 1:19 PM, Ross Walker <ross.rosswalker.co.uk>wrote:
>
>> Hi Insuk,
>>
>> A couple of questions. Is this just the dev branch (up to date) or do you
>> see the same problem with Amber 12?
>>
>> Can you try with openMPI and see if the problem is still there.
>>
>> module unload mvapich2_ib
>> module load openmpi_ib
>>
>> Put this in your .bashrc, logout and back in and then recompile.
>>
>> All the best
>> Ross
>>
>> /\
>> \/
>> |\oss Walker
>>
>> ---------------------------------------------------------
>> | Assistant Research Professor |
>> | San Diego Supercomputer Center |
>> | Adjunct Assistant Professor |
>> | Dept. of Chemistry and Biochemistry |
>> | University of California San Diego |
>> | NVIDIA Fellow |
>> | http://www.rosswalker.co.uk | http://www.wmd-lab.org |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> ---------------------------------------------------------
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
>> be read every day, and should not be used for urgent or sensitive issues.
>>
>>
>>
>>
>>
>>
>>
>> On 10/25/12 8:46 AM, "InSuk Joung" <i.joung.gmail.com> wrote:
>>
>> >I installed amber from the master branch on Gordon(UCSD) and I have a
>> >problem in running pmemd.MPI. Here is my mdout file.
>> >
>> > -------------------------------------------------------
>> > Amber 12 SANDER 2012
>> > -------------------------------------------------------
>> >
>> >| PMEMD implementation of SANDER, Release 12
>> >
>> >| Run on 10/25/2012 at 08:04:36
>> >
>> > [-O]verwriting output
>> >
>> >File Assignments:
>> >| MDIN: md.in
>> >
>> >| MDOUT: model.eq.out
>> >
>> >| INPCRD: model.min.rst
>> >
>> >| PARM: model.prmtop
>> >
>> >| RESTRT: model.eq.rst
>> >
>> >| REFC: model.min.rst
>> >
>> >| MDVEL: mdvel
>> >
>> >| MDEN: mden
>> >
>> >| MDCRD: mdcrd
>> >
>> >| MDINFO: mdinfo
>> >
>> >|LOGFILE: logfile
>> >
>> >
>> >
>> > Here is the input file:
>> >
>> >eq1
>> >
>> > &cntrl
>> >
>> > irest=0, ntx=1,
>> >
>> > ntpr=2500, ntwr=25000, ntwx=0, ntwe=0,
>> >
>> > nscm=1250, temp0=298, tempi=0,
>> >
>> > ntf=2, ntc=2,
>> >
>> > ntb=2, ntt=3, gamma_ln=5.0, ntp=2, tautp=1.0, taup=1.0,
>> >
>> > nstlim=25000, dt=0.002,
>> >
>> > cut=9.0,
>> >
>> > iwrap=1,
>> >
>> > ntr=1, restraint_wt=50,
>> >
>> > restraintmask=':1-600',
>> >
>> > &end
>> >
>> >
>> >
>> >
>> >
>> >| Conditional Compilation Defines Used:
>> >| DIRFRC_COMTRANS
>> >| DIRFRC_EFS
>> >| DIRFRC_NOVEC
>> >| MPI
>> >| PUBFFT
>> >| FFTLOADBAL_2PROC
>> >| BINTRAJ
>> >| MKL
>> >
>> >| Largest sphere to fit in unit cell has radius = 14.100
>> >
>> >| New format PARM file being parsed.
>> >| Version = 1.000 Date = 10/25/12 Time = 08:03:47
>> >
>> >| Note: 1-4 EEL scale factors are being read from the topology file.
>> >
>> >| Note: 1-4 VDW scale factors are being read from the topology file.
>> >| Duplicated 0 dihedrals
>> >
>> >| Duplicated 0 dihedrals
>> >
>>
>> >--------------------------------------------------------------------------
>> >------
>> > 1. RESOURCE USE:
>>
>> >--------------------------------------------------------------------------
>> >------
>> >
>> > getting new box info from bottom of inpcrd
>> >
>> > NATOM = 7132 NTYPES = 4 NBONH = 6000 MBONA = 0
>> > NTHETH = 0 MTHETA = 0 NPHIH = 0 MPHIA = 0
>> > NHPARM = 0 NPARM = 0 NNB = 9132 NRES = 3132
>> > NBONA = 0 NTHETA = 0 NPHIA = 0 NUMBND = 2
>> > NUMANG = 0 NPTRA = 0 NATYP = 4 NPHB = 1
>> > IFBOX = 1 NMXRS = 3 IFCAP = 0 NEXTRA = 0
>> > NCOPY = 0
>> >
>> >| Coordinate Index Table dimensions: 13 5 5
>> >| Direct force subcell size = 5.6620 5.6400 5.6400
>> >
>> > BOX TYPE: RECTILINEAR
>> >
>>
>> >--------------------------------------------------------------------------
>> >------
>> > 2. CONTROL DATA FOR THE RUN
>>
>> >--------------------------------------------------------------------------
>> >------
>> >
>> >default_name
>> >
>> >
>> >General flags:
>> > imin = 0, nmropt = 0
>> >
>> >Nature and format of input:
>> > ntx = 1, irest = 0, ntrx = 1
>> >
>> >Nature and format of output:
>> > ntxo = 1, ntpr = 2500, ntrx = 1, ntwr =
>> >25000
>> > iwrap = 1, ntwx = 0, ntwv = 0, ntwe =
>> > 0
>> > ioutfm = 0, ntwprt = 0, idecomp = 0, rbornstat=
>> > 0
>> >
>> >Potential function:
>> > ntf = 2, ntb = 2, igb = 0, nsnb =
>> > 25
>> > ipol = 0, gbsa = 0, iesp = 0
>> > dielc = 1.00000, cut = 9.00000, intdiel = 1.00000
>> >
>> >Frozen or restrained atoms:
>> > ibelly = 0, ntr = 1
>> > restraint_wt = 50.00000
>> >
>> >Molecular dynamics:
>> > nstlim = 25000, nscm = 0, nrespa = 1
>> > t = 0.00000, dt = 0.00200, vlimit = 20.00000
>> >
>> >Langevin dynamics temperature regulation:
>> > ig = 71277
>> > temp0 = 298.00000, tempi = 0.00000, gamma_ln= 5.00000
>> >
>> >Pressure regulation:
>> > ntp = 2
>> > pres0 = 1.00000, comp = 44.60000, taup = 1.00000
>> >
>> >SHAKE:
>> > ntc = 2, jfastw = 0
>> > tol = 0.00001
>> >
>> >| Intermolecular bonds treatment:
>> >| no_intermolecular_bonds = 1
>> >
>> >| Energy averages sample interval:
>> >| ene_avg_sampling = 2500
>> >
>> >Ewald parameters:
>> > verbose = 0, ew_type = 0, nbflag = 1, use_pme =
>> > 1
>> > vdwmeth = 1, eedmeth = 1, netfrc = 1
>> > Box X = 73.606 Box Y = 28.200 Box Z = 28.200
>> > Alpha = 90.000 Beta = 90.000 Gamma = 90.000
>> > NFFT1 = 80 NFFT2 = 30 NFFT3 = 30
>> > Cutoff= 9.000 Tol =0.100E-04
>> > Ewald Coefficient = 0.30768
>> > Interpolation order = 4
>> >
>> >| PMEMD ewald parallel performance parameters:
>> >| block_fft = 1
>> >| fft_blk_y_divisor = 2
>> >| excl_recip = 0
>> >| excl_master = 0
>> >| atm_redist_freq = 320
>> >
>> > LOADING THE CONSTRAINED ATOMS AS GROUPS
>> >
>> >
>> > 5. REFERENCE ATOM COORDINATES
>> >
>> > default_name
>> >
>> > Mask :1-600; matches 600 atoms
>> >
>>
>> >--------------------------------------------------------------------------
>> >------
>> > 3. ATOMIC COORDINATES AND VELOCITIES
>>
>> >--------------------------------------------------------------------------
>> >------
>> >
>> >default_name
>> >
>> > begin time read from input coords = 0.000 ps
>> >
>> >
>> > Number of triangulated 3-point waters found: 2000
>> >
>> > Sum of charges from parm topology file = 0.00000000
>> > Forcing neutrality...
>> >
>> >| Dynamic Memory, Types Used:
>> >| Reals 539386
>> >| Integers 329122
>> >
>> >| Nonbonded Pairs Initial Allocation: 269656
>> >
>> >| Running AMBER/MPI version on 16 nodes
>> >
>> >
>>
>> >--------------------------------------------------------------------------
>> >------
>> > 4. RESULTS
>>
>> >--------------------------------------------------------------------------
>> >------
>> >
>> > ---------------------------------------------------
>> > APPROXIMATING switch and d/dx switch using CUBIC SPLINE INTERPOLATION
>> > using 5000.0 points per unit in tabled values
>> > TESTING RELATIVE ERROR over r ranging from 0.0 to cutoff
>> >| CHECK switch(x): max rel err = 0.2738E-14 at 2.422500
>> >| CHECK d/dx switch(x): max rel err = 0.8314E-11 at 2.736960
>> > ---------------------------------------------------
>> >|---------------------------------------------------
>> >| APPROXIMATING direct energy using CUBIC SPLINE INTERPOLATION
>> >| with 50.0 points per unit in tabled values
>> >| Relative Error Limit not exceeded for r .gt. 2.39
>> >| APPROXIMATING direct force using CUBIC SPLINE INTERPOLATION
>> >| with 50.0 points per unit in tabled values
>> >| Relative Error Limit not exceeded for r .gt. 2.84
>> >|---------------------------------------------------
>> >
>> >Warning! Rndv Receiver is expecting 72000 Bytes But, is receiving 54000
>> >Bytes
>> >Warning! Rndv Receiver is expecting 75600 Bytes But, is receiving 54000
>> >Bytes
>> >Warning! Rndv Receiver is expecting 72000 Bytes But, is receiving 54000
>> >Bytes
>> >Warning! Rndv Receiver is expecting 75600 Bytes But, is receiving 54000
>> >Bytes
>> >
>> >After the warnings, it seems that pmemd.MPI hangs.
>> >
>> >With sander.MPI, I don't have any problem.
>> >
>> >Make test.parallel.pmemd passes ok.
>> >
>> >The compiler was intel 12.1.1.256 Build 20111011, and the MPI library was
>> >mvapich2_ib 1.8a1p1.
>> >I see the same problem in Ranger(TACC) with intel 12.1 mvapich2 1.8.
>> >
>> >--
>> >Best,
>> >InSuk Joung
>> >_______________________________________________
>> >AMBER-Developers mailing list
>> >AMBER-Developers.ambermd.org
>> >http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>>
>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>
>
>
> --
> Best,
> InSuk Joung
>



-- 
Best,
InSuk Joung
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Oct 25 2012 - 12:30:04 PDT
Custom Search