Re: [AMBER-Developers] Current parallel test failures in the tree

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Thu, 25 Mar 2010 13:09:12 -0400

Hi All,

I found a few bugs in the NEB code, including the one that was causing the
test cases to fail. The end beads in NEB exit before they do any of the
tangent calculation since for them it is not needed. However, the neb_force
array was not being zeroed on these beads before their early exit, hence the
problem was only with compilers that don't automatically zero everything.
There was also a smaller but significant bug where the variable ncopy was
being used instead of neb_nbead, which led to incorrect energies being used
in the tangent calculation (this is what was causing the issues that Mark
was seeing with the NEB_NRG_ALL array since ncopy is not used in NEB and is
0).

The attached patch fixes said bugs, as well as updates the NEB test
runscripts to use up to 12 processors only, test output from all beads, and
use absolute error criterion instead of truncating floating point digits
with the dacdif script (which sometimes caused weird and inconsistent
conversion of 0.000 to 0. and a fake Fail).

Let me know if the patch doesn't work for you. It should be applied from
$AMBERHOME.

-Dan

On Wed, Mar 24, 2010 at 5:56 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> > > possible FAILURE: check neb_gb_partial_01.out.dif
> > > /home/rcw/cvs_checkouts/amber11/test/neb-testcases/neb_gb_partial
> > >> vlimit exceeded for step 1; vmax = **********
> > > etc
> > > etc
>
>
> > Did you get my email (cc'd to Dan and Carlos) about these ones
> > specifically? I concluded that it was a compiler bug; (ifort 10.1.018).
> > What version of ifort did you use?
>
> I don't buy this. 10.1.018 has been a version we have been recommending for
> years because of it's stability. I would be surprised to find an error now.
> More likely this test case has VERY bad initial structures since just
> naively setting up end points and allowing an interpolation will lead to
> vlimit errors in NEB. Chances are there is a division by a very small
> number
> occurring here, due to the images being on top of each other, and some
> compilers are more sensitive to rounding this to zero than others. It would
> probably be better if an actual restart from an equilibrated set of
> coordinates was used as the test case, this would likely give a more
> reliable regression test. I'm cc'ing this email to Christina who I hope can
> put together a more realistic test case.
>
> > > The ti_decomp_1 also has issues:
> > > possible FAILURE: check ti_decomp_1.out.dif
> > > /home/rcw/cvs_checkouts/amber11/test/ti_decomp
>
> > > This is way beyond rounding errors, even with having values printed
> > to way
> > > too much precision for a test case in the output. Who is responsible
> > for
> > > ti_decomp these days? They should probably look at this ASAP.
> >
> > I think this fail has been present for a long while and was discussed
> > before on here the other week; Dave has cc'd Holger about it.
>
> We need volunteers to fix this ASAP hence why I posted it.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>



-- 
-------------------------
Daniel R. Roe
Postdoctoral Associate
SAS - Chemistry & Chemical Biology
610 Taylor Road
Piscataway, NJ   08854



_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Thu Mar 25 2010 - 10:30:02 PDT
Custom Search