Re: [AMBER-Developers] Parallel make failing

From: Jason Swails <jason.swails.gmail.com>
Date: Thu, 26 May 2016 12:53:43 -0400

On Thu, May 26, 2016 at 12:31 PM, Ray Luo <rluo.uci.edu> wrote:

> Okay, I've just confirmed again that I can
>
> make -j 32 install
>
> if I turned off python building. I don't know too much how the python
> part is built ... Please take a look of the dependence relations
> related to python building.
>

​This diagnosis doesn't make sense. make does not control any of the
Python building -- it fires off a single command (along the lines of python
setup.py install) and lets distutils handle building the packages and
modules serially. There are no intra-package dependencies in Python
packages (and only pytraj depends on another Amber file being available at
compile-time, and that is correctly handled).

Moreover, the error message that Dan reported indicated that one of the
PBSA module files was missing, so I'm pretty sure the missing dependency is
in PBSA. Because this is a race condition, you won't always be able to
observe it reliably, and unless you start each build from a clean state,
you can't draw any conclusions from watching subsequent builds succeed.

Specifically, if you did the following:

./configure gnu
make -j32 install
​# remove Python build from Makefile
make -j32 install

And saw that the second install worked, the reason for that is almost
certainly that the first make -j32 install that failed actually fulfilled
the missing dependency.​ When you have 32 threads working at once and one
dies, the other 31 finish what they are doing before quitting. So if one
of the PBSA modules is missing a dependency, then there's a good chance
that one of the "other" 31 threads is currently working on building that
dependency. So when they quit, subsequent builds work just fine because
the dependency that is not specified correctly is already fulfilled,
anyway. I actually used to utilize this when parallel builds broke in the
past by just running "make -j# install" twice.

And even if removing the Python builds from the Makefiles fixes parallel
make on your machine starting from a clean state, it's still possible that
it just tweaked the load balancer enough to hide the crash without fixing
the underlying problem.

​That, coupled with the fact that the error message involves building PBSA
modules for sander -- a process that in no way includes or even references
Python -- ​means that there is no way this problem can be related to the
Python packages. Based on past experience, the way this needs to be fixed
is most likely to modify the makedepend script in pbsa to add another entry
to one of the hash tables that keeps track of which F90 files have
different code paths for different libraries (e.g., via #ifdef LIBSANDER or
something).

All the best,
Jason

-- 
Jason M. Swails
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu May 26 2016 - 10:00:03 PDT
Custom Search