amber-developers: Amber test case updates

From: Ross Walker <ross.rosswalker.co.uk>
Date: Wed, 9 May 2007 12:33:02 -0700

Hi All,

I have overhauled a lot of the test cases in amber 10 to make them more
robust when it comes to testing in parallel. This includes putting in more
checks for when nproc is too high. This means you can now run the tests with
32 cpus and the 10 or so test cases that would normally fail are skipped.

I have also tried to make things more robust so it runs with IBM AIX as well
as other systems.

I have centralized the numprocs.awk script in the main test directory and
all test case scripts should reference this.

I have also put in a number of guards for the REM, PIMD and NEB test cases
so they only run if there are enough mpi threads and the mpi threads is an
exact multiple of the number of groups. For example the neb test case run
scripts look like this and I suggest that we all use this format when adding
future tests to at least try to stay consistent:

--------------
#!/bin/csh -f

set sander = "../../../exe/sander.MPI"
if( $?TESTsander ) then
   set sander = $TESTsander
endif

if( ! $?DO_PARALLEL ) then
  echo " NEB can only be run in parallel. "
  echo " This test case requires a minimum of 8 mpi threads to run."
  echo " set env var DO_PARALLEL"
  echo " Not running test, exiting....."
  exit(0)
else
  set numprocs=`echo $DO_PARALLEL | awk -f ../../numprocs.awk `
  if ( $numprocs == 8 || $numprocs == 16 || $numprocs == 24 ) then
      goto runtest
  else if ( $?MP_PROCS)then
      if ($MP_PROCS == 8 || $MP_PROCS == 16 || $MP_PROCS == 24)then
        goto runtest
      endif
  endif
endif

echo " This test case requires a least 8 mpi threads."
echo " The number of mpi threads must also be a multiple of 8 and not more
than 24."
echo " Not running test, exiting....."
exit(0)

runtest:
cat > mdin <<EOF
Alanine NEB initial MD with small K
 &cntrl
  imin = 0, irest = 0,
  ntc=1, ntf=1,
  ntpr=1, ntwx=500,
  ntb = 0, cut = 999.0, rgbmax=999.0,
  igb = 1, saltcon=0.2,
  nstlim = 25, nscm=0,
  dt = 0.0005,
  ntt = 3, gamma_ln=1000.0,
  tempi=0, temp0=300, ig=42,
  ineb = 1,skmin = 10,skmax = 10,
  nmropt=1
 /
 &wt type='TEMP0', istep1=0,istep2=35000,
   value1=0.0, value2=300.0
 /
 &wt type='END'
 /
EOF

touch dummy
$DO_PARALLEL $sander -ng 8 -groupfile groupfile.in >
neb_classical.sander.out <dummy || goto error
../../dacdif -t 1 neb_classical.out.save neb_classical.out

/bin/rm -f mdin *.inf *.mdcrd dummy *.rst neb_classical.sander.out
endif

exit(0)

error:
echo "program error."
exit(1)
--------------

Note: if anybody knows how to do if numproc is an exact multiple of 4 and
less than 32 in csh then please let me know and we can avoid the expanded if
line on most of the tests.

CURRENT TEST CASE PROBLEMS:

There are a number of problems with the test cases in parallel at the moment
that I am hoping some people can shed light on so we can get them cleared
up.

1) Run.dip (rdc)
This fails in parallel with an "unable to open file" error message. But the
necessary files seem to exist so I am at a lost to explain this.

2) pb_ivcap1
This test case segfaults in parallel. Last line of output is:
" Atoms are partitioned into two regions 6268 45 with a buffer of 0.000"

3) pb_ivcap5
This test case has been commented out in the Makefile with no comment as to
why - Holger added this on 2006/05/01 but there does not seem to be any
comment in the cvs log as to why it is commented out.

4) PIMD/part_nmpimd_ntp; ./Run.nmpimd
This test case seems to give a large number of non-trivial differences - I
have not tried it in serial to check if this is simply a problem in
parallel.

5) There is no test.sander.PIMD.MPI target so I removed it from the Makefile

6) full_nmpimd_water
This test case fails with some non-trivial differences in the temperature.
But only in the temperature.

7) PIMD/full_pimd_ntp_water
pimd_ntp.out.save is missing.

8) PIMD/full_pimd_nhc_water/
pimd_nhc.in is missing.

9) In addition all the PIMD/full* test cases fail when nproc>ngroups. E.g.
when nproc =8 I get:

Fatal error in MPI_Reduce: Invalid communicator, error stack:
MPI_Reduce(843): MPI_Reduce(sbuf=0x121e290, rbuf=0x3523f70, count=28,
MPI_DOUBLE_PRECISION, MPI_SUM, root=0, MPI_COMM_NULL) failed
MPI_Reduce(714): Null communicator

I assumed that pimd is designed to run when nproc is a multiple of ngroup so
that if you have 8 processors and 4 groups each group is processed in
parallel with 2 of the processors. If this is not the case then the test
cases should be updated here to quit if nproc is not set right.

Ps. I didn't check / update the evb test cases since I didn't know what is
supposed to be working and what isn't so so Kim you will need to do this.

All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
Received on Sun May 13 2007 - 06:07:13 PDT
Custom Search