Re: [AMBER-Developers] Zero VDW radii causing havoc.

From: Scott Le Grand <varelse2005.gmail.com>
Date: Sat, 17 Aug 2013 10:30:30 -0700

PS one more possibility - netfrc matters for small systems running NTP but
somewhere along the line its impact is lost in the noise of larger
systems...




On Sat, Aug 17, 2013 at 10:04 AM, Scott Le Grand <varelse2005.gmail.com>wrote:

> So I have root caused why the nans appear: Fast Shake gets overwhelmed.
>
> At the point of failure, feed these numbers into fast shake:
>
> x0(1, atm_1) = 7.28912097579671414138;
> x0(1, atm_2) = 7.72543530415168433478;
> x0(1, atm_3) = 6.44526356757315177504;
>
> x0(2, atm_1) = 0.59867993601737978793;
> x0(2, atm_2) = 28.86782004149711866603 - 29.0969602342;
> x0(2, atm_3) = 0.53986248962842253718;
>
> x0(3, atm_1) = -14.70521127008194284258;
> x0(3, atm_2) = -14.90664483562995812349;
> x0(3, atm_3) = -15.15318116034974238460;
>
> x1(1, atm_1) = 7.37472831962579089549;
> x1(1, atm_2) = 11.58546676297256716737;
> x1(1, atm_3) = 6.48373270122986600228;
>
> x1(2, atm_1) = 0.50367624590281323549;
> x1(2, atm_2) = 25.98966495033111101520 - 29.0969602342;
> x1(2, atm_3) = 0.44969881379113374464;
>
> x1(3, atm_1) = -14.74963003569916608626;
> x1(3, atm_2) = -15.53454241815419933914;
> x1(3, atm_3) = -15.08733880938599369870;
>
> And what will happen is that
>
> al2be2 = 1.599492
> gama = -1.454111
>
> Feed this into line 1491 of shake.F90:
>
> sinthe = (alpa*gama - beta * sqrt(al2be2 - gama * gama)) / al2be2
>
> and sqrt(al2be2 - gama * gama) becomes sqrt(-0.514948) and that's a NaN.
> All else follows from this...
>
> As to why you haven't seen this elsewhere, some possibilities:
>
> 1. You actually have, simulations blow up all the time. How often have
> you root caused the exact moment of failure rather than just run it again
> when it's something this rare?
> 2. Perhaps using vlimit is a sufficient band-aid to prevent this most of
> the time.
> 3. The bug exists in every way in the FORTRAN code, but its probability of
> occurrence is higher when single-precision is in the mix in some subtle
> way. Because no matter how you slice this, the FORTRAN code has a whole
> bunch of sqrt operations here that are unguarded from when their operand
> goes negative.
> 4. I'm missing something in my shake code.
> 5. pmemd.cuda is hosed somehow.
>
> I personally would lay even odds on all of these. The big mystery for me
> is why this *seems* to happen more with small systems. It makes me wonder
> if there is some bizarro boundary effect going on in PBC having to do with
> the relative surface area of the boundary of the box relative to its volume.
>
> Ideas?
>
>
>
>
> On Sat, Aug 17, 2013 at 9:12 AM, Carlos Simmerling <
> carlos.simmerling.gmail.com> wrote:
>
>> I've never seen this in 20 years, except in free energy calculations where
>> the oxygen is being made smaller. In my opinion its probably a symptom,
>> not
>> the problem, unless the simulations are really different from anything
>> I've
>> ever done.
>> On Aug 16, 2013 10:39 PM, "Ross Walker" <ross.rosswalker.co.uk> wrote:
>>
>> > So that suggests to me that this could indeed be a bug related to
>> having a
>> > small water box - has anyone seen this happen with the CPU code? A repro
>> > there would be very telling.
>> >
>> > Also, this is NPT yes? As in it is not some NVT water box at some
>> > artificially high compression?
>> >
>> > All the best
>> > Ross
>> >
>> >
>> >
>> >
>> > On 8/16/13 7:33 PM, "B. Lachele Foley" <lfoley.ccrc.uga.edu> wrote:
>> >
>> > >I can only add the reminder that our main workaround has been to
>> increase
>> > >the size of the water box. That is, it disproportionately seems to
>> > >affect smaller systems. The locals who run simulations of really large
>> > >systems never seem to see this. Sure, they can't run as long, but
>> there
>> > >are many more interactions in each step. I suspect most lipid
>> > >simulations are also much larger than some of the systems we run. I
>> > >can't promise that this information is useful, but there it is.
>> > >
>> > >FWIW, we're going to re-run some simulations that failed, but with our
>> HO
>> > >vdW terms on the HW, just to see what happens.
>> > >
>> > >We haven't seen it in CPU. I realize that doesn't mean it couldn't
>> > >happen there.
>> > >
>> > >:-) Lachele
>> > >
>> > >Dr. B. Lachele Foley
>> > >Complex Carbohydrate Research Center
>> > >The University of Georgia
>> > >Athens, GA USA
>> > >lfoley.uga.edu
>> > >http://glycam.ccrc.uga.edu
>> > >
>> > >________________________________________
>> > >From: Ross Walker [ross.rosswalker.co.uk]
>> > >Sent: Friday, August 16, 2013 10:02 PM
>> > >To: AMBER Developers Mailing List
>> > >Subject: Re: [AMBER-Developers] Zero VDW radii causing havoc.
>> > >
>> > >Scott, I'm still not sure I am convinced on this being the issue. We
>> have
>> > >many many simulations done with TIP3P water - probably hundreds of
>> > >milliseconds in aggregate, a lot with boosted potentials and other
>> types
>> > >of enhanced sampling and haven't seen this issue. Sure we've seen
>> problems
>> > >with zero VDW hydroxyl hydrogens before - but those are always when
>> they
>> > >are constrained to be close to each other or with massive attraction
>> such
>> > >as in some sugars and phosphates. I don't think I've ever seen problems
>> > >with TIP3P before since the bonds are rigid and the H's are, I believe,
>> > >inside the VDW sphere of the oxygen. Hence I would never expect to see
>> > >this for two waters, for a water and a strained hydroxyl perhaps but
>> even
>> > >then unlikely.
>> > >
>> > >Are you certain this is a real problem and not a strange artifact, and
>> or
>> > >subtle bug? - Do we have any repro's for the CPU code?
>> > >
>> > >The only place I'd expect there could be problems with TIP3P is if it
>> is
>> > >run without shake. So something must be unique about the examples here
>> > >that are showing the repro that doesn't normally occur in protein
>> and/or
>> > >dna simulations.
>> > >
>> > >Ps. we've never seen a problem with the lipid12 force field as far as I
>> > >know and we have over 100 microsecs of simulation on these systems.
>> > >
>> > >Hence I am still a little suspicious...
>> > >
>> > >All the best
>> > >Ross
>> > >
>> > >
>> > >
>> > >On 8/16/13 6:20 PM, "Scott Le Grand" <varelse2005.gmail.com> wrote:
>> > >
>> > >>Yep, that's why I moved the bug over to pmemd and away from
>> pmemd.cuda...
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>On Fri, Aug 16, 2013 at 6:17 PM, B. Lachele Foley
>> > >><lfoley.ccrc.uga.edu>wrote:
>> > >>
>> > >>> We only had one, the HO (Ho) term. It's been made non-zero which at
>> > >>>least
>> > >>> fixed the issue in the highly charged, perfectly aligned situation
>> > >>>where we
>> > >>> first saw it. BTW, we (specifically Matt) first observed it on CPU.
>> > >>>
>> > >>> Can changing the LJ A term like that potentially mess up other
>> things
>> > >>>like
>> > >>> hydrogen bonding?
>> > >>>
>> > >>> :-) Lachele
>> > >>>
>> > >>> Dr. B. Lachele Foley
>> > >>> Complex Carbohydrate Research Center
>> > >>> The University of Georgia
>> > >>> Athens, GA USA
>> > >>> lfoley.uga.edu
>> > >>> http://glycam.ccrc.uga.edu
>> > >>>
>> > >>> ________________________________________
>> > >>> From: Scott Le Grand [varelse2005.gmail.com]
>> > >>> Sent: Friday, August 16, 2013 9:11 PM
>> > >>> To: AMBER Developers Mailing List
>> > >>> Subject: Re: [AMBER-Developers] Zero VDW radii causing havoc.
>> > >>>
>> > >>> There are a heck of a lot more TIP3P waters running around your test
>> > >>>case.
>> > >>> Anything with an effective zero VDW radius is a problem.
>> > >>>
>> > >>> 20 years ago, I used to see these when I did global conformational
>> > >>>search
>> > >>> of proteins. And when it happened, energies would plummet into a
>> deep
>> > >>> abyss and hose the search process. The only thing different with
>> MD is
>> > >>> that the finite timestep messes up a water badly enough that the
>> fast
>> > >>>Shake
>> > >>> recovers it into a bizarre position that now hits a bunch of other
>> > >>>atoms
>> > >>> and then all you-know-what breaks loose and we're off to
>> NaNdyland...
>> > >>>
>> > >>> The more things change...
>> > >>>
>> > >>>
>> > >>> On Fri, Aug 16, 2013 at 6:06 PM, B. Lachele Foley <
>> lfoley.ccrc.uga.edu
>> > >>> >wrote:
>> > >>>
>> > >>> > Jason said: "The problem-case hydrogens in the relevant
>> > >>>carbohydrates
>> > >>> > (and lipid?) force fields should be modified if they represent
>> > >>>problems
>> > >>> in
>> > >>> > normal simulation. "
>> > >>> >
>> > >>> > FYI: Scott Le Grand, in the bug report Ross mentioned, just said
>> it
>> > >>>was
>> > >>> > two TIP3P waters doing it.
>> > >>> >
>> > >>> > :-) Lachele
>> > >>> >
>> > >>> > Dr. B. Lachele Foley
>> > >>> > Complex Carbohydrate Research Center
>> > >>> > The University of Georgia
>> > >>> > Athens, GA USA
>> > >>> > lfoley.uga.edu
>> > >>> > http://glycam.ccrc.uga.edu
>> > >>> >
>> > >>> > ________________________________________
>> > >>> > From: Jason Swails [jason.swails.gmail.com]
>> > >>> > Sent: Friday, August 16, 2013 3:28 PM
>> > >>> > To: AMBER Developers Mailing List
>> > >>> > Subject: Re: [AMBER-Developers] Zero VDW radii causing havoc.
>> > >>> >
>> > >>> > On Aug 16, 2013, at 11:39 AM, David A Case <
>> case.biomaps.rutgers.edu
>> > >
>> > >>> > wrote:
>> > >>> >
>> > >>> > > On Fri, Aug 16, 2013, Ross Walker wrote:
>> > >>> > >
>> > >>> > >> So I am proposing that we have code in rdparm that detects zero
>> > >>>VDW
>> > >>> > radii,
>> > >>> > >> prints a warning message about it and then adjusts these radii
>> in
>> > >>>the
>> > >>> > code
>> > >>> > >> to 0.0000001.
>> > >>> > >
>> > >>> > > I don't understand how/why this fixes any problem. First, you
>> > >>>would
>> > >>> have
>> > >>> > > to get to distances comparable to the radius for such a term to
>> > >>>have
>> > >>> any
>> > >>> > > effect. You would have already very much messed things up by
>> that
>> > >>> point.
>> > >>> > > Second, atoms like OH have a zero epsilon as well as a zero
>> radius,
>> > >>>so
>> > >>> > > just changing the radius would have no effect.
>> > >>> >
>> > >>> > I think Ross meant changing the A and B coefficients to 1E-6 or
>> so,
>> > >>>not
>> > >>> > the pre-combined values.
>> > >>> >
>> > >>> > I'm with Dave on this though. I've run many microseconds of net
>> > >>> simulation
>> > >>> > and I've never seen a problem with zero vdW terms in HO atom
>> types. I
>> > >>> > realize there are some cases where this bug hits you, but as I
>> recall
>> > >>>it
>> > >>> is
>> > >>> > a strange case with carbohydrates. I'm wary about just making
>> these
>> > >>> changes
>> > >>> > under the assumption it will have only remedial effects. The
>> > >>>problem-case
>> > >>> > hydrogens in the relevant carbohydrates (and lipid?) force fields
>> > >>>should
>> > >>> be
>> > >>> > modified if they represent problems in normal simulation.
>> > >>> >
>> > >>> > I can also quite easily write a quick parmed action to implement
>> > >>>these
>> > >>> > changes without having to hard-code this new convention
>> immediately
>> > >>>into
>> > >>> > every md engine we have.
>> > >>> >
>> > >>> > Also, I think that the amber codes should implement the Amber
>> force
>> > >>>field
>> > >>> > exactly as it is defined in the prmtop. Implicitly changing
>> > >>>parameters in
>> > >>> > the MD codes is a bad idea, IMO. (How hard would it have been to
>> > >>>validate
>> > >>> > chamber if CHARMM took that approach?) So if we decide to
>> implement
>> > >>>this
>> > >>> > change, it should be done in LEaP, not rdparm (but again, I think
>> the
>> > >>>LJ
>> > >>> > terms should be derivable straight from the combining rules).
>> > >>> >
>> > >>> > > Note that Istvan's lmodprmtop hack directly changes the LJ A
>> term
>> > >>>from
>> > >>> > > zero to 1000. This has the advantage of not assuming any
>> > >>>particular
>> > >>> > mixing
>> > >>> > > rules, i.e. it still works when NBFIX like changes have been
>> used.
>> > >>> >
>> > >>> > As an aside, should lmodprmtop functionality be incorporated into
>> > >>>parmed
>> > >>> > and/or cpptraj if it does simple things? (I'm not sure what else
>> it
>> > >>> does...)
>> > >>> >
>> > >>> > All the best,
>> > >>> > Jason
>> > >>> >
>> > >>> > _______________________________________________
>> > >>> > AMBER-Developers mailing list
>> > >>> > AMBER-Developers.ambermd.org
>> > >>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > _______________________________________________
>> > >>> > AMBER-Developers mailing list
>> > >>> > AMBER-Developers.ambermd.org
>> > >>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>> >
>> > >>> _______________________________________________
>> > >>> AMBER-Developers mailing list
>> > >>> AMBER-Developers.ambermd.org
>> > >>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>>
>> > >>>
>> > >>>
>> > >>> _______________________________________________
>> > >>> AMBER-Developers mailing list
>> > >>> AMBER-Developers.ambermd.org
>> > >>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>>
>> > >>_______________________________________________
>> > >>AMBER-Developers mailing list
>> > >>AMBER-Developers.ambermd.org
>> > >>http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >
>> > >
>> > >
>> > >_______________________________________________
>> > >AMBER-Developers mailing list
>> > >AMBER-Developers.ambermd.org
>> > >http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >
>> > >
>> > >
>> > >_______________________________________________
>> > >AMBER-Developers mailing list
>> > >AMBER-Developers.ambermd.org
>> > >http://lists.ambermd.org/mailman/listinfo/amber-developers
>> >
>> >
>> >
>> > _______________________________________________
>> > AMBER-Developers mailing list
>> > AMBER-Developers.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>> >
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Aug 17 2013 - 11:00:03 PDT
Custom Search