[AMBER-Developers] Parallel Test Failures

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Sat, 20 Mar 2010 14:37:27 -0400

Just wanted to point out some (mostly innocuous) parallel test failures I'm
currently getting.

Parallel Build Details:
CVS as of 2010-03-20 11:23 AM EST
./configure -mpi gnu
mpich2 1.2.1
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)

Most of the errors seem output format-related - it seems that particularly
EAMBER is being printed now where it wasn't before. The ti_decomp test also
fails but I think people are working on that. The
evb/malon_dbonds_umb_GAFF_MORSIFY_nmpimd_full test will fail if the user
sets TESTsander to a non-LES sander, but I don't think there is a good way
to check for it and if the user looks at the output file they'll see it's
not really a Fail. All of the ncsu tests fail, but this is a well-known bug
with this version of the gnu compilers. Is that in the documentation
somewhere? I'm not sure where it would go - somewhere in section 1.3 of the
Amber manual I guess, or at least the KNOWN_PROBLEMS file.

Here are the serious errors:

On 2, 4, and 8 processors:
==============================================================
export TESTsander='../../exe/pmemd.MPI'; cd gb_rna && ./Run.gbrna
Assertion failed in file helper_fns.c at line 337: 0
memcpy argument memory ranges overlap, dst_=0xd21f6c0 src_=0xd21f6c0
len_=3768

internal ABORT - process 0
rank 0 in job 682 case1_45027 caused collective abort of all ranks
  exit status of rank 0: return code 1
  ./Run.gbrna: Program error
make: *** [test.parallel.pmemd] Error 1

The file helper_fns.c is part of the MPI distribution, but I'm not sure
where in PMEMD the call is being generated from yet.

On 8 processors:
==============================================================
cd qmmm2/xcrd_build_test/ && ./Run.ortho_qmewald0

 * NB pairs 145 185645 exceeds capacity ( 185750) 3
     SIZE OF NONBOND LIST = 185750
 SANDER BOMB in subroutine nonbond_list
 Non bond list overflow!
 check MAXPR in locmem.f
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
rank 3 in job 800 case1_45027 caused collective abort of all ranks
  exit status of rank 3: return code 1
  ./Run.ortho_qmewald0: Program error
make[1]: *** [test.sander.QMMM] Error 1
make[1]: Leaving directory `/u1/opt/Amber/CVS/amber11/test'
make: *** [test.sander.QMMM.MPI] Error 2

Should this test just not run on more than 4 procs?

-Dan


-- 
-------------------------
Daniel R. Roe
Postdoctoral Associate
SAS - Chemistry & Chemical Biology
610 Taylor Road
Piscataway, NJ   08854
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Mar 20 2010 - 12:00:02 PDT
Custom Search