Re: [AMBER-Developers] [AMBER] CUDA install fails due to undefined netcdf definitions

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 11 Jun 2013 22:49:24 -0400

On Tue, Jun 11, 2013 at 9:27 PM, David A Case <case.biomaps.rutgers.edu>wrote:

> {moving this discussion to amber-developers}
>
>
> On Tue, Jun 11, 2013, Jason Swails wrote:
> >
> > This actually makes a lot of sense now. It was definitely pulling the
> > libnetcdf.a file from /usr/lib
>
> If I understand it, this sounds like incorrect behavior. Amber should not
> be
> using /usr/lib/libnetcdf.a unless the user explicitly asks for it with
> the --with-netcdf flag.
>

It is certainly not what we _wanted_ to happen (so it is incorrect in that
sense), but I'm pretty sure it's what _was_ happening (it's the only thing
that explains the errors and the fix).

The sander Makefile says:
>
> .... -L/home/case/amber12/lib -lnetcdf
>
> but I'm thinking this might not be correct -- is the "-L" directory
> searched
> before or after places like /usr/lib? (I'm afraid that the "-L" directory
> comes at the end of LDFLAGS, not at the beginning.)
>

It could come at the end, or the linker could be looking to link .so's
first (and it may find those in /usr/lib before looking for the .a's in
$AMBERHOME/lib). I'm not sure.


>
> > More recent versions of NetCDF
> > actually bundle the Fortran and C symbols in separate libraries.
>
> This is the reason why it seems generally safer to build our own netcdf
> libraries than trying to guess or assume what might be in some system
> library.
> (Same argument as for MPI libraries.)
>

NetCDF seems fairly stable to me (but I may be mistaken). The
libnetcdf.a/libnetcdff.a separation has been around for some time, and is
correctly identified and handled by the --with-netcdf option in configure.

In any case, the presence of the --with-netcdf flag is unrelated to this
particular issue. Since the OP didn't use it, it's as though it doesn't
exist. (But the fact that it exists gave a potential workaround that did
not require uninstalling the system NetCDF).

I suppose this might be wrong, and that we should figure out the package
> names for various linuxes and Macs and cygwin, and just require that users
> install those packages. But I'd like to keep the size of the list of
> packages needed for Amber down to ones that really are standard almost
> everywhere--not sure if netcdf yet falls into that category.
>

I don't think there's any need to require users to install their own
NetCDF. I think the way we do things is good (and the less 'root' we
require users to have, the better everyone's life will be, IMO). This is
especially true given the fact that we use the netcdf.mod file, which is
specific to not only the compiler brand, but also its version (this
compatibility is also checked by --with-netcdf inside configure).

And what we have now is in some ways the worst (at least the most complex)
> of all options--trying to support *both* the locally-compiled netcdf and a
> system-built one. And I've not yet been convinced by arguments about why
> it's important to have a --with-netcdf flag.
>

The emphasis is placed on supporting the locally-compiled NetCDF, but
supporting system NetCDFs is quite easy--in fact, no changes have been
necessary to maintain compatibility with external NetCDF installations
since I added the option in the first place a year or two ago (and it still
works).

Since we don't stress or advertise the --with-netcdf flag anywhere, nor do
we include it in any of the instructions, I don't see much of a
disadvantage of keeping the flag around (although I'd certainly be fine
with eliminating it if continuing to support its existence required actual
work).

I have some arguments for considering it worthwhile to keep --with-netcdf.
 First, I've actually run into a system where the NetCDF build failed, and
using --with-netcdf allowed me to bypass the failed NetCDF build and finish
the Amber installation. The error message and solution are shown in the PS.

On Bluewaters I build CPU code with PGI and pmemd.cuda with GNU (since
those are the only options that create test-passing executables all the way
around). The only way to support both installations in the same tree
simultaneously was to use the system NetCDF for at least one of the two
installations. A corner case to be sure, but still one where I found the
option useful.

Finally, our bundled NetCDF is becoming outdated. By default, NetCDF files
created by the Python-NetCDF bindings do not work with cpptraj since the
bundled NetCDF is too old to recognize the new format. You need to use a
special compatibility flag to the Dataset constructor to create a NetCDF
file that cpptraj understands when built with Amber's NetCDF3. This
problem goes away when linking cpptraj with the same NetCDF used to build
the Python bindings. (Is this an argument for updating our NetCDF?)

Given NetCDF's general superiority to ASCII, I would not be surprised to
see more Amber output go the NetCDF route, and standard data files
printed/read by cpptraj also be NetCDF format. In this case, allowing
external NetCDF support may be important if Amber is not strictly the only
NetCDF file creator that cpptraj needs to read files from.

All the best,
Jason

P.S., the NetCDF build fail:

My environment (not my doing):
[swails . UF HPC /scratch/hpc/swails/amber12] $ echo $CC $CXX $F90 $F77
mpicc mpicxx mpif90 mpif77

[swails . UF HPC /scratch/hpc/swails/amber12] $ mpif77 -show
ifort -I/usr/mpi/intel/openmpi-1.6/include -L/usr/lib64 -Wl,-rpath
-Wl,/usr/mpi/intel/openmpi-1.6/lib64 -L/usr/mpi/intel/openmpi-1.6/lib64
-lmpi_f77 -lmpi -lrdmacm -libverbs -lrt -lnsl -lutil -ltorque -ldl -lm
-Wl,--export-dynamic -lrt -lnsl -lutil

[swails . UF HPC /scratch/hpc/swails/amber12] $ mpif90 -show
ifort -I/usr/mpi/intel/openmpi-1.6/include
-I/usr/mpi/intel/openmpi-1.6/lib64 -L/usr/lib64 -Wl,-rpath
-Wl,/usr/mpi/intel/openmpi-1.6/lib64 -L/usr/mpi/intel/openmpi-1.6/lib64
-lmpi_f90 -lmpi_f77 -lmpi -lrdmacm -libverbs -lrt -lnsl -lutil -ltorque
-ldl -lm -Wl,--export-dynamic -lrt -lnsl -lutil


NetCDF build error:

make[4]: Entering directory
`/scratch/hpc/swails/amber12/AmberTools/src/netcdf/src/f90'
/bin/sh ../libtool --mode=compile ifort -I../libsrc -I. -shared-intel -ip
-O3 -xHost -c -o typeSizes.lo typeSizes.f90
libtool: compile: unable to infer tagged configuration
libtool: compile: specify a tag with `--tag'
make[4]: *** [typeSizes.lo] Error 1
make[4]: Leaving directory
`/scratch/hpc/swails/amber12/AmberTools/src/netcdf/src/f90'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory
`/scratch/hpc/swails/amber12/AmberTools/src/netcdf/src'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/scratch/hpc/swails/amber12/AmberTools/src/netcdf/src'
make[1]: *** [/scratch/hpc/swails/amber12/include/netcdf.mod] Error 2
make[1]: Leaving directory `/scratch/hpc/swails/amber12/AmberTools/src'
make: *** [install] Error 2

The following line in my .bashrc ultimately fixed this (but I don't even
remember how I discovered this solution, and the error message was useless
to me):
# Get rid of pesky compiler vars.
unset F90 F77 CC CXX

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Jun 11 2013 - 20:00:02 PDT
Custom Search