[AMBER-Developers] Current state of CVS tree. (GNU Compilers)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Thu, 4 Mar 2010 12:30:25 -0800

Hi All,

I thought I would post an overview of the current state of the CVS tree, as
seen on my RHEL4 machine (which incidentally builds AMBER 10 fine and as of
a month or so ago built AMBER 11 fine). I am hoping some volunteers will try
to go through and address some of these issues.

1) Attempt 1 - Using GNU compilers - Serial
-------------------------------------------

cvs co amber11
export AMBERHOME=~/cvs_checkouts/amber11

gcc -v
gcc version 3.4.6 20060404 (Red Hat 3.4.6-11)

gfortran -v
gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)

./configure gnu
make -f Makefile_at

...
gcc -o rdparm main.o rdparm.o dispatch.o help.o utility.o second.o io.o
trajectory.o netcdf_ptraj.o parallel_ptraj.o evec.o torsion.o mask.o rms.o
display.o interface.o energy.o experimental.o ptraj.o actions.o analyze.o
thermo.o pubfft.o cluster.o clusterLib.o
/home/rcw/cvs_checkouts/amber11/lib/libpdb.a
/home/rcw/cvs_checkouts/amber11/lib/arpack.a
-L/opt/intel/mkl/10.1.1.019//lib/em64t -Wl,--start-group
/opt/intel/mkl/10.1.1.019//lib/em64t/libmkl_intel_lp64.a
/opt/intel/mkl/10.1.1.019//lib/em64t/libmkl_sequential.a
/opt/intel/mkl/10.1.1.019//lib/em64t/libmkl_core.a -Wl,--end-group -lpthread
-lgfortran ../netcdf/lib/libnetcdf.a -lm
/usr/bin/ld: cannot find -lgfortran
collect2: ld returned 1 exit status

locate libgfortran
/usr/lib64/libgfortran.so.1
...
/usr/lib/libgfortran.so.1
...
etc

Removing -lgfortan gives
thermo.o(.text+0x395): In function `thermo_':
: undefined reference to `_gfortran_st_write'
...
As well as lots of arpack.a issues:

/home/rcw/cvs_checkouts/amber11/lib/arpack.a(dsgets.o)(.text+0x5d): In
function `dsgets_':
: undefined reference to `_gfortran_compare_string'

Which are related I guess.

So giving up on AMBERTools and just going on to building AMBER - Knowing of
course that mm_pbsa won't compile since it now needs nab so requires
AMBERTools to have been built properly first. One used to be able to build
AMBER as a standalone.

make

This works. make -j8 also works.

cd ../test
make

This gives (non-benign) failures for (TEST failures file is attached):

possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_ANALYSIS
possible FAILURE: check monitor.txt.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_ANALYSIS
possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_FLOODING
possible FAILURE: check monitor.txt.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_FLOODING
possible FAILURE: check umbrella.ncdump.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_FLOODING
possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_UMBRELLA
possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_UMBRELLA
possible FAILURE: check monitor.txt.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/abmd_UMBRELLA
possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/smd
possible FAILURE: check smd.txt.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/smd
possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/pmd
possible FAILURE: check pmd.txt.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/pmd
possible FAILURE: check mdout.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/smd2
possible FAILURE: check work.txt.dif
/home/rcw/cvs_checkouts/amber11/test/ncsu/smd2

All of these have the error:
> ** NCSU-Error ** : expected list value for key 'i', got '<EMPTY>' instead

The rest are just minor difference. Things not being updated for AMBER 11
being printed in the output, roundoff etc.

2) Attempt 1 - Using GNU compilers - Parallel
---------------------------------------------

cd $AMBERHOME/src
./configure -mpi gnu

Fails because of needing yacc from a successful AMBER tools build. Even
though this is NOT needed for building AMBER in parallel. Hacking out the if
statement and proceeding:

which mpif90
/server-home/netbin/mpi/mpich2-1.0.7-gfortran-4.1.2/bin/mpif90

make -f Makefile_at clean
make clean
make parallel (This works, -j8 also works after I updated the depend file)
cd ../test
make clean
export DO_PARALLEL='mpirun -np 2'
make test.parallel

This gives the following non-benign errors:

cd dhfr && ./Run.dhfr.noboxinfo
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0[cli_0]: aborting
job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
rank 0 in job 18 caffeine.sdsc.edu_59236 caused collective abort of all
ranks
  exit status of rank 0: return code 1
diffing mdout.dhfr.noboxinfo.save with mdout.dhfr.noboxinfo
PASSED

This is the correct error since this just tests that the code prints an
error about the box info being missing from the inpcrd file and quits but it
may confuse the user.

possible FAILURE: check trajene.out.dif
/home/rcw/cvs_checkouts/amber11/test/trajene_box
< 1 -9.4141E+3 1.7335E+1 7.4542E+1 N 7
> 1 -9.4141E+3 1.7335E+1 7.4541E+1 N 7
< VDWAALS = 1371.5986 EEL = -11615.0228 HBOND = 0.
> VDWAALS = 1371.7742 EEL = -11615.1955 HBOND = 0.
< minimization completed, ENE=-0.94141269E+4 RMS= 0.173345E+2
> minimization completed, ENE=-0.94141240E+4 RMS= 0.173348E+2

This is benign I think but we have lots of precision in the output files
which will always cause problems.

possible FAILURE: check out.0.dif
/home/rcw/cvs_checkouts/amber11/test/softcore/min
210c210
< 100 -8.9414E+3 1.6490E+1 4.6223E+1 O 40
> 100 -8.9463E+3 1.6490E+1 4.6223E+1 O 40
< VDWAALS = 937.0834 EEL = -9878.4642 HBOND = 0.
> VDWAALS = 932.1686 EEL = -9878.4642 HBOND = 0.
< DV/DL = -0.6676
< SC_VDW = -0.3007 SC_EEL = 0. SC_DERIV= -10.4972
> SC_VDW = -0.3007 SC_EEL = 0. SC_DERIV= 0.

This is uncomfortably different. Same with

possible FAILURE: check out.1.dif
/home/rcw/cvs_checkouts/amber11/test/softcore/min

possible FAILURE: check ti_decomp_1.out.dif
/home/rcw/cvs_checkouts/amber11/test/ti_decomp
Lots of issues here.

As before NONE of the NCSU test cases work.

export DO_PARALLEL='mpirun -np 4'
make test.parallel.4proc

Just about EVERY SINGLE test case fails here. See the attached
TEST_FAILURES.diff.4cpu_gnu

export DO_PARALLEL='mpirun -np 8'
make test.parallel.8proc

Both NEB tests fail - See attached TEST_FAILURES.diff.8cpu_gnu

export DO_PARALLEL='mpirun -np 32'
make test.parallel.32proc

This test fails.

So is anybody actually regularly running the parallel tests?

I'll provide an overview of the situation with the Intel compiler shortly. I
would encourage people to start looking at what they may have done to break
some of the parallel test cases. In particular the softcore stuff and also
why none of the NCSU stuff works.

3) Attempt 1 - Using GNU compilers - Cuda
-----------------------------------------

make clean
make -f Makefile_at clean
./configure -cuda gnu
make -j8 cuda (This works)
cd ../test
make test.serial.cuda

These all pass with the exception of dhfr_min which is a known problem.

All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.





_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers



Received on Thu Mar 04 2010 - 13:00:03 PST
Custom Search