Re: [AMBER-Developers] Zero VDW radii causing havoc.

From: Scott Le Grand <varelse2005.gmail.com>
Date: Sat, 17 Aug 2013 10:04:58 -0700

So I have root caused why the nans appear: Fast Shake gets overwhelmed.

At the point of failure, feed these numbers into fast shake:

            x0(1, atm_1) = 7.28912097579671414138;
            x0(1, atm_2) = 7.72543530415168433478;
            x0(1, atm_3) = 6.44526356757315177504;

            x0(2, atm_1) = 0.59867993601737978793;
            x0(2, atm_2) = 28.86782004149711866603 - 29.0969602342;
            x0(2, atm_3) = 0.53986248962842253718;

            x0(3, atm_1) = -14.70521127008194284258;
            x0(3, atm_2) = -14.90664483562995812349;
            x0(3, atm_3) = -15.15318116034974238460;

            x1(1, atm_1) = 7.37472831962579089549;
            x1(1, atm_2) = 11.58546676297256716737;
            x1(1, atm_3) = 6.48373270122986600228;

            x1(2, atm_1) = 0.50367624590281323549;
            x1(2, atm_2) = 25.98966495033111101520 - 29.0969602342;
            x1(2, atm_3) = 0.44969881379113374464;

            x1(3, atm_1) = -14.74963003569916608626;
            x1(3, atm_2) = -15.53454241815419933914;
            x1(3, atm_3) = -15.08733880938599369870;

And what will happen is that

al2be2 = 1.599492
gama = -1.454111

Feed this into line 1491 of shake.F90:

    sinthe = (alpa*gama - beta * sqrt(al2be2 - gama * gama)) / al2be2

and sqrt(al2be2 - gama * gama) becomes sqrt(-0.514948) and that's a NaN.
All else follows from this...

As to why you haven't seen this elsewhere, some possibilities:

1. You actually have, simulations blow up all the time. How often have you
root caused the exact moment of failure rather than just run it again when
it's something this rare?
2. Perhaps using vlimit is a sufficient band-aid to prevent this most of
the time.
3. The bug exists in every way in the FORTRAN code, but its probability of
occurrence is higher when single-precision is in the mix in some subtle
way. Because no matter how you slice this, the FORTRAN code has a whole
bunch of sqrt operations here that are unguarded from when their operand
goes negative.
4. I'm missing something in my shake code.
5. pmemd.cuda is hosed somehow.

I personally would lay even odds on all of these. The big mystery for me
is why this *seems* to happen more with small systems. It makes me wonder
if there is some bizarro boundary effect going on in PBC having to do with
the relative surface area of the boundary of the box relative to its volume.

Ideas?




On Sat, Aug 17, 2013 at 9:12 AM, Carlos Simmerling <
carlos.simmerling.gmail.com> wrote:

> I've never seen this in 20 years, except in free energy calculations where
> the oxygen is being made smaller. In my opinion its probably a symptom, not
> the problem, unless the simulations are really different from anything I've
> ever done.
> On Aug 16, 2013 10:39 PM, "Ross Walker" <ross.rosswalker.co.uk> wrote:
>
> > So that suggests to me that this could indeed be a bug related to having
> a
> > small water box - has anyone seen this happen with the CPU code? A repro
> > there would be very telling.
> >
> > Also, this is NPT yes? As in it is not some NVT water box at some
> > artificially high compression?
> >
> > All the best
> > Ross
> >
> >
> >
> >
> > On 8/16/13 7:33 PM, "B. Lachele Foley" <lfoley.ccrc.uga.edu> wrote:
> >
> > >I can only add the reminder that our main workaround has been to
> increase
> > >the size of the water box. That is, it disproportionately seems to
> > >affect smaller systems. The locals who run simulations of really large
> > >systems never seem to see this. Sure, they can't run as long, but there
> > >are many more interactions in each step. I suspect most lipid
> > >simulations are also much larger than some of the systems we run. I
> > >can't promise that this information is useful, but there it is.
> > >
> > >FWIW, we're going to re-run some simulations that failed, but with our
> HO
> > >vdW terms on the HW, just to see what happens.
> > >
> > >We haven't seen it in CPU. I realize that doesn't mean it couldn't
> > >happen there.
> > >
> > >:-) Lachele
> > >
> > >Dr. B. Lachele Foley
> > >Complex Carbohydrate Research Center
> > >The University of Georgia
> > >Athens, GA USA
> > >lfoley.uga.edu
> > >http://glycam.ccrc.uga.edu
> > >
> > >________________________________________
> > >From: Ross Walker [ross.rosswalker.co.uk]
> > >Sent: Friday, August 16, 2013 10:02 PM
> > >To: AMBER Developers Mailing List
> > >Subject: Re: [AMBER-Developers] Zero VDW radii causing havoc.
> > >
> > >Scott, I'm still not sure I am convinced on this being the issue. We
> have
> > >many many simulations done with TIP3P water - probably hundreds of
> > >milliseconds in aggregate, a lot with boosted potentials and other types
> > >of enhanced sampling and haven't seen this issue. Sure we've seen
> problems
> > >with zero VDW hydroxyl hydrogens before - but those are always when they
> > >are constrained to be close to each other or with massive attraction
> such
> > >as in some sugars and phosphates. I don't think I've ever seen problems
> > >with TIP3P before since the bonds are rigid and the H's are, I believe,
> > >inside the VDW sphere of the oxygen. Hence I would never expect to see
> > >this for two waters, for a water and a strained hydroxyl perhaps but
> even
> > >then unlikely.
> > >
> > >Are you certain this is a real problem and not a strange artifact, and
> or
> > >subtle bug? - Do we have any repro's for the CPU code?
> > >
> > >The only place I'd expect there could be problems with TIP3P is if it is
> > >run without shake. So something must be unique about the examples here
> > >that are showing the repro that doesn't normally occur in protein and/or
> > >dna simulations.
> > >
> > >Ps. we've never seen a problem with the lipid12 force field as far as I
> > >know and we have over 100 microsecs of simulation on these systems.
> > >
> > >Hence I am still a little suspicious...
> > >
> > >All the best
> > >Ross
> > >
> > >
> > >
> > >On 8/16/13 6:20 PM, "Scott Le Grand" <varelse2005.gmail.com> wrote:
> > >
> > >>Yep, that's why I moved the bug over to pmemd and away from
> pmemd.cuda...
> > >>
> > >>
> > >>
> > >>
> > >>On Fri, Aug 16, 2013 at 6:17 PM, B. Lachele Foley
> > >><lfoley.ccrc.uga.edu>wrote:
> > >>
> > >>> We only had one, the HO (Ho) term. It's been made non-zero which at
> > >>>least
> > >>> fixed the issue in the highly charged, perfectly aligned situation
> > >>>where we
> > >>> first saw it. BTW, we (specifically Matt) first observed it on CPU.
> > >>>
> > >>> Can changing the LJ A term like that potentially mess up other things
> > >>>like
> > >>> hydrogen bonding?
> > >>>
> > >>> :-) Lachele
> > >>>
> > >>> Dr. B. Lachele Foley
> > >>> Complex Carbohydrate Research Center
> > >>> The University of Georgia
> > >>> Athens, GA USA
> > >>> lfoley.uga.edu
> > >>> http://glycam.ccrc.uga.edu
> > >>>
> > >>> ________________________________________
> > >>> From: Scott Le Grand [varelse2005.gmail.com]
> > >>> Sent: Friday, August 16, 2013 9:11 PM
> > >>> To: AMBER Developers Mailing List
> > >>> Subject: Re: [AMBER-Developers] Zero VDW radii causing havoc.
> > >>>
> > >>> There are a heck of a lot more TIP3P waters running around your test
> > >>>case.
> > >>> Anything with an effective zero VDW radius is a problem.
> > >>>
> > >>> 20 years ago, I used to see these when I did global conformational
> > >>>search
> > >>> of proteins. And when it happened, energies would plummet into a
> deep
> > >>> abyss and hose the search process. The only thing different with MD
> is
> > >>> that the finite timestep messes up a water badly enough that the fast
> > >>>Shake
> > >>> recovers it into a bizarre position that now hits a bunch of other
> > >>>atoms
> > >>> and then all you-know-what breaks loose and we're off to NaNdyland...
> > >>>
> > >>> The more things change...
> > >>>
> > >>>
> > >>> On Fri, Aug 16, 2013 at 6:06 PM, B. Lachele Foley <
> lfoley.ccrc.uga.edu
> > >>> >wrote:
> > >>>
> > >>> > Jason said: "The problem-case hydrogens in the relevant
> > >>>carbohydrates
> > >>> > (and lipid?) force fields should be modified if they represent
> > >>>problems
> > >>> in
> > >>> > normal simulation. "
> > >>> >
> > >>> > FYI: Scott Le Grand, in the bug report Ross mentioned, just said
> it
> > >>>was
> > >>> > two TIP3P waters doing it.
> > >>> >
> > >>> > :-) Lachele
> > >>> >
> > >>> > Dr. B. Lachele Foley
> > >>> > Complex Carbohydrate Research Center
> > >>> > The University of Georgia
> > >>> > Athens, GA USA
> > >>> > lfoley.uga.edu
> > >>> > http://glycam.ccrc.uga.edu
> > >>> >
> > >>> > ________________________________________
> > >>> > From: Jason Swails [jason.swails.gmail.com]
> > >>> > Sent: Friday, August 16, 2013 3:28 PM
> > >>> > To: AMBER Developers Mailing List
> > >>> > Subject: Re: [AMBER-Developers] Zero VDW radii causing havoc.
> > >>> >
> > >>> > On Aug 16, 2013, at 11:39 AM, David A Case <
> case.biomaps.rutgers.edu
> > >
> > >>> > wrote:
> > >>> >
> > >>> > > On Fri, Aug 16, 2013, Ross Walker wrote:
> > >>> > >
> > >>> > >> So I am proposing that we have code in rdparm that detects zero
> > >>>VDW
> > >>> > radii,
> > >>> > >> prints a warning message about it and then adjusts these radii
> in
> > >>>the
> > >>> > code
> > >>> > >> to 0.0000001.
> > >>> > >
> > >>> > > I don't understand how/why this fixes any problem. First, you
> > >>>would
> > >>> have
> > >>> > > to get to distances comparable to the radius for such a term to
> > >>>have
> > >>> any
> > >>> > > effect. You would have already very much messed things up by
> that
> > >>> point.
> > >>> > > Second, atoms like OH have a zero epsilon as well as a zero
> radius,
> > >>>so
> > >>> > > just changing the radius would have no effect.
> > >>> >
> > >>> > I think Ross meant changing the A and B coefficients to 1E-6 or so,
> > >>>not
> > >>> > the pre-combined values.
> > >>> >
> > >>> > I'm with Dave on this though. I've run many microseconds of net
> > >>> simulation
> > >>> > and I've never seen a problem with zero vdW terms in HO atom
> types. I
> > >>> > realize there are some cases where this bug hits you, but as I
> recall
> > >>>it
> > >>> is
> > >>> > a strange case with carbohydrates. I'm wary about just making these
> > >>> changes
> > >>> > under the assumption it will have only remedial effects. The
> > >>>problem-case
> > >>> > hydrogens in the relevant carbohydrates (and lipid?) force fields
> > >>>should
> > >>> be
> > >>> > modified if they represent problems in normal simulation.
> > >>> >
> > >>> > I can also quite easily write a quick parmed action to implement
> > >>>these
> > >>> > changes without having to hard-code this new convention immediately
> > >>>into
> > >>> > every md engine we have.
> > >>> >
> > >>> > Also, I think that the amber codes should implement the Amber force
> > >>>field
> > >>> > exactly as it is defined in the prmtop. Implicitly changing
> > >>>parameters in
> > >>> > the MD codes is a bad idea, IMO. (How hard would it have been to
> > >>>validate
> > >>> > chamber if CHARMM took that approach?) So if we decide to implement
> > >>>this
> > >>> > change, it should be done in LEaP, not rdparm (but again, I think
> the
> > >>>LJ
> > >>> > terms should be derivable straight from the combining rules).
> > >>> >
> > >>> > > Note that Istvan's lmodprmtop hack directly changes the LJ A term
> > >>>from
> > >>> > > zero to 1000. This has the advantage of not assuming any
> > >>>particular
> > >>> > mixing
> > >>> > > rules, i.e. it still works when NBFIX like changes have been
> used.
> > >>> >
> > >>> > As an aside, should lmodprmtop functionality be incorporated into
> > >>>parmed
> > >>> > and/or cpptraj if it does simple things? (I'm not sure what else it
> > >>> does...)
> > >>> >
> > >>> > All the best,
> > >>> > Jason
> > >>> >
> > >>> > _______________________________________________
> > >>> > AMBER-Developers mailing list
> > >>> > AMBER-Developers.ambermd.org
> > >>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>> >
> > >>> >
> > >>> >
> > >>> > _______________________________________________
> > >>> > AMBER-Developers mailing list
> > >>> > AMBER-Developers.ambermd.org
> > >>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>> >
> > >>> _______________________________________________
> > >>> AMBER-Developers mailing list
> > >>> AMBER-Developers.ambermd.org
> > >>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> AMBER-Developers mailing list
> > >>> AMBER-Developers.ambermd.org
> > >>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>>
> > >>_______________________________________________
> > >>AMBER-Developers mailing list
> > >>AMBER-Developers.ambermd.org
> > >>http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > >
> > >
> > >_______________________________________________
> > >AMBER-Developers mailing list
> > >AMBER-Developers.ambermd.org
> > >http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > >
> > >
> > >_______________________________________________
> > >AMBER-Developers mailing list
> > >AMBER-Developers.ambermd.org
> > >http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> >
> >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Aug 17 2013 - 10:30:02 PDT
Custom Search