Re: [AMBER-Developers] gpu_update_box

From: David Cerutti <dscerutti.gmail.com>
Date: Sat, 19 Mar 2022 12:18:46 -0400

So I gave some more thought to your problem, and I think you need to try
the following to ensure that your method is safe. Consider a truncated
octahedral unit cell, which is a very symmetric solid: a = b = c = L, alpha
= beta = gamma = 109.4712206 degrees (the tetrahedral angle). The number
of subdivisions of this solid should be the same in all directions. If you
use the method above, you will get the same cut_factor along all three
dimensions, which may or may not be what you want. Based on what I see in
cit.F90, this seems to be what happens. Again, I know the Hessian norms
approach, not sure whether scaling the minimum subdivision cell length by
the inverse sines of the other two angles will be a perfect solution but it
certainly pushes the result in the right direction, and if it turns out to
be overkill it won't be by much. As of 2016, cut_factor was copied over to
the simulationConst global object but unused. If you can bring it to bear,
you will sweep up the vast, vast majority and probably all of a problem
that occasionally became apparent.

Dave


On Fri, Mar 18, 2022 at 4:09 PM Scott Le Grand <varelse2005.gmail.com>
wrote:

> So... I think recalculating the cut_factors in gpu_pressure_scale addresses
> the issue entirely, no? This exploits the upper diagonal nature of ucell.
> Nice catch overall.
>
> On Fri, Mar 18, 2022 at 12:58 PM Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
> > OK so I think I will just update a, b, and c here for now as this is part
> > of the fixed point stuff. Then dive into it more rigorously later.
> >
> > On Fri, Mar 18, 2022 at 12:53 PM David Cerutti <dscerutti.gmail.com>
> > wrote:
> >
> >> Cut factor is intended to account for that, but from what I recall (and
> >> I'm
> >> struggling to recall things I did 3-4 years ago) it's an imperfect fix.
> >> The general case is solved by computing the Hessian norms as I'm doing,
> >> determining those unit cell face-to-face spans, and then determining the
> >> number of domain subdivisions than will fit. Realize also that, unlike
> >> pmemd's CPU code, which dynamically reallocates the pair list
> >> subdivision as the box changes volume, pmemd.cuda is confined to a
> single
> >> decomposition and if that dips below the allowable limits then the
> program
> >> shuts down.
> >>
> >> But this does jog my memory a bit--the problem wasn't with the way you
> >> were
> >> combining the cutoff and unit cell fractional to real space
> transformation
> >> matrix per se, but rather the fact that the cutoff plus the pair list
> >> margin needed to be combined before applying this analysis. I think the
> >> issue was that, before pmemd.cuda realized that the simulation cell had
> >> indeed shrunk too much and hit the all-stop and boosted the user's
> >> positive
> >> energy with that error message, it could dip below the safe margin
> defined
> >> by something akin to cut_factor. That could leave you in situations
> where
> >> the box was really too small for the cutoff + pair list margin but the
> >> code
> >> didn't yet realize it. A more general, precise fix was what I think was
> >> needed. I just fell back on what I knew--you can find HessianNorms back
> >> in
> >> mdgx. But it may be possible that one can assess the columns as you
> say,
> >> so long as it is with the combined cutoff and pair list margin, to
> >> determine the subdivision count. I know that a bug got fixed by the
> >> change
> >> I submitted, although one would have to delve back into the logs to find
> >> out exactly what. Being a bug that depends on certain combinations of
> >> pair
> >> list margins and box dimensions, it's very much like the tricky
> precision
> >> issue and will be hard to know if things are safe if we simply roll
> back.
> >> But follow the reasoning and things should be all right.
> >>
> >> Dave
> >>
> >>
> >> On Fri, Mar 18, 2022 at 3:34 PM Scott Le Grand <varelse2005.gmail.com>
> >> wrote:
> >>
> >> > Also, isn't cut_factor (pbc.F90) supposed to account for this? Looks
> >> like
> >> > the ucell matrix remains upper diagonal under all circumstance?
> >> >
> >> > 114 if (is_orthog .ne. 0) then
> >> > 115 ucell(:, :) = 0.d0
> >> > 116 ucell(1, 1) = a
> >> > 117 ucell(2, 2) = b
> >> > 118 ucell(3, 3) = c
> >> > 119 cut_factor(:) = 1.d0
> >> > 120 else
> >> > 121 factor = PI / 180.d0
> >> > 122 ucell(1, 1) = a
> >> > 123 ucell(2, 1) = 0.d0
> >> > 124 ucell(3, 1) = 0.d0
> >> > 125 ucell(1, 2) = b * cos(factor * gamma)
> >> > 126 ucell(2, 2) = b * sin(factor * gamma)
> >> > 127 ucell(3, 2) = 0.d0
> >> > 128 ucell(1, 3) = c * cos(factor * beta)
> >> > 129 ucell(2, 3) = (b * c * cos(factor * alpha) - ucell(1, 3)
> *
> >> &
> >> > 130 ucell(1, 2))/ucell(2, 2)
> >> > 131 ucell(3, 3) = sqrt(c * c - ucell(1, 3) * ucell(1, 3) -
> >> ucell(2,
> >> > 3) * &
> >> > 132 ucell(2, 3))
> >> > 133
> >> > 134 ! Cut factors are used to correct for "spherical cutoff
> >> > protrusion" into
> >> > 135 ! adjacent unit cells. The problem is that the point at
> >> which
> >> > a cutoff
> >> > 136 ! sphere is tangent to a unit cell side is not the contact
> >> > point for
> >> > 137 ! projection of an orthogonal x, y, or z vector in a
> >> > nonorthogonal unit
> >> > 138 ! cell. We thus have to increase the cutoff a bit to
> allow
> >> for
> >> > the longer
> >> > 139 ! distance for the orthogonal projection.
> >> > 140
> >> > 141 cut_factor(1) = 1.d0 / (sin(factor * beta) * sin(factor *
> >> > gamma))
> >> > 142 cut_factor(2) = 1.d0 / (sin(factor * alpha) * sin(factor *
> >> > gamma))
> >> > 143 cut_factor(3) = 1.d0 / (sin(factor * alpha) * sin(factor *
> >> > beta))
> >> > 144 end if
> >> >
> >> >
> >> >
> >> > On Fri, Mar 18, 2022 at 12:21 PM Scott Le Grand <
> varelse2005.gmail.com>
> >> > wrote:
> >> >
> >> > > Finally, the fix is surprisingly atomic modulo working the fixed
> point
> >> > > math into cellOfffset.
> >> > >
> >> > > On Fri, Mar 18, 2022 at 12:20 PM Scott Le Grand <
> >> varelse2005.gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Also the fix is in(tm), but I need to address gpu_update_box and I
> >> need
> >> > >> to know if ucell matrices are still upper diagonal because if not
> >> that
> >> > will
> >> > >> require some further work as I had to update the cell offsets to be
> >> > >> precalculated to make fixed point consistent and properly bounded
> so
> >> > there
> >> > >> can't ever be an overflow.
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Fri, Mar 18, 2022 at 12:17 PM Scott Le Grand <
> >> varelse2005.gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >>> First up, thanks for replying! Appreciated!
> >> > >>>
> >> > >>> But... For triclinic cells, I never saw a ucell matrix full
> >> occupied.
> >> > >>> u(2,1), u(3,1), and u(3,2) are always zero, yes?
> >> > >>>
> >> > >>> If so, then the column space defines a, b, and c by the respective
> >> > >>> column lengths that *are* the distances between those separating
> >> > planes.
> >> > >>> Check it out for yourself. Can you point me to a case that
> violates
> >> > this?
> >> > >>> If so, we have to fix the whole shebang, not just this routine.
> >> > Because I
> >> > >>> agree with you if we have fully occupied ucell matrices, otherwise
> >> no.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On Fri, Mar 18, 2022 at 12:10 PM David Cerutti <
> dscerutti.gmail.com
> >> >
> >> > >>> wrote:
> >> > >>>
> >> > >>>> What's going on here is that not all unit cells are
> >> orthorhombic--such
> >> > >>>> would imply that the box angles alpha, beta, and gamma are always
> >> 90
> >> > >>>> degrees. The general unit cell is triclinic, where alpha, beta,
> >> and
> >> > >>>> gamma
> >> > >>>> can take on any values (with some degree of excluded territory,
> as
> >> > >>>> certain
> >> > >>>> combinations would fail to make a shape that can exist). The
> code
> >> you
> >> > >>>> are
> >> > >>>> seeing are seeing was created in response to another aspect of
> the
> >> > >>>> original
> >> > >>>> pmemd.cuda pair list that, if I recall correctly, was missing
> >> > >>>> interactions
> >> > >>>> from time to time due to a miscommunication between the way the
> >> pair
> >> > >>>> list
> >> > >>>> was generated versus the way the box had changed. I don't recall
> >> if
> >> > it
> >> > >>>> was
> >> > >>>> creating energy drift or causing an outright crash. It was also
> >> > around
> >> > >>>> the
> >> > >>>> time I added the -AllowSmallBox flag: note that the code will not
> >> > allow
> >> > >>>> users to run simulations that are less than three pair list
> >> > subdivisions
> >> > >>>> along any dimension without an explicit instruction to do so, as
> I
> >> > >>>> wasn't
> >> > >>>> sure that the way the particles get re-imaged at the beginning of
> >> each
> >> > >>>> time
> >> > >>>> was safe if there are only two subdivisions.
> >> > >>>>
> >> > >>>> But the HessianNorms function is calculating the actual thickness
> >> of
> >> > >>>> each
> >> > >>>> subdomain cell, as defined by the distance separating the planes
> of
> >> > >>>> opposing faces. This is integral to the reason that, unless you
> do
> >> > some
> >> > >>>> sophisticated stuff like the honeycomb idea I had, a truncated
> >> > >>>> octahedron
> >> > >>>> is not as efficient as the 22% reduction in overall particle
> count
> >> > would
> >> > >>>> imply: the total volume of each cell needs to expand as it tilts,
> >> and
> >> > >>>> this
> >> > >>>> implies more tiles to cover the same volume. It's still
> >> advantageous
> >> > to
> >> > >>>> use a truncated octahedron, as the Hilbert Space curve does
> pretty
> >> > well
> >> > >>>> to
> >> > >>>> localize what is needed and of course the cutoff shell of any
> given
> >> > >>>> particle remains spherical.
> >> > >>>>
> >> > >>>> What's the ETA on the energy drift fix? Is this an atomic
> change,
> >> or
> >> > >>>> does
> >> > >>>> it also come with other refactoring / rollback of various
> precision
> >> > >>>> trims?
> >> > >>>>
> >> > >>>> Dave
> >> > >>>>
> >> > >>>>
> >> > >>>> On Fri, Mar 18, 2022 at 2:14 PM Scott Le Grand <
> >> varelse2005.gmail.com
> >> > >
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>> > Let's look at this, here's the new code:
> >> > >>>> > HessianNorms(invU, cdepth);
> >> > >>>> > double skin = cdepth[0]/gpu->sim.xcells - gpu->sim.cut;
> >> > >>>> > double yskin = cdepth[1]/gpu->sim.ycells - gpu->sim.cut;
> >> > >>>> > double zskin = cdepth[2]/gpu->sim.zcells - gpu->sim.cut;
> >> > >>>> >
> >> > >>>> > The array cdepth is filed with the new a, b, and c by means of
> an
> >> > >>>> external
> >> > >>>> > call. But as far as I know, all unit cells are orthorhombic, so
> >> > we've
> >> > >>>> > always been able to use:
> >> > >>>> > // Recalculate a, b, c
> >> > >>>> > double a =
> >> > >>>> > pNTPData->ucell[0];
> >> > >>>> > double b =
> >> > >>>> > sqrt(pNTPData->ucell[1] * pNTPData->ucell[1] +
> >> > >>>> >
> >> > >>>> > pNTPData->ucell[4] * pNTPData->ucell[4]);
> >> > >>>> > double c =
> >> > >>>> > sqrt(pNTPData->ucell[2] * pNTPData->ucell[2] +
> >> > >>>> >
> >> > >>>> > pNTPData->ucell[5] * pNTPData->ucell[5] +
> >> > >>>> >
> >> > >>>> > pNTPData->ucell[8] * pNTPData->ucell[8]);
> >> > >>>> >
> >> > >>>> > because of the constraints that imposes on the column space of
> >> the
> >> > >>>> unit
> >> > >>>> > cell matrix.
> >> > >>>> >
> >> > >>>> > So either the first code clause is overkill and repeated work
> or
> >> we
> >> > >>>> have to
> >> > >>>> > revisit the previous orthorhombic assumption. Which is it?
> >> > >>>> >
> >> > >>>> > But even then, why would one change the unit cell matrix
> without
> >> > >>>> updating
> >> > >>>> > the a, b, and c derived from it? And yes, I'm impatient w/r to
> >> > fixing
> >> > >>>> bugs.
> >> > >>>> > Guilty.
> >> > >>>> >
> >> > >>>> >
> >> > >>>> > On Fri, Mar 18, 2022 at 10:42 AM Scott Le Grand <
> >> > >>>> varelse2005.gmail.com>
> >> > >>>> > wrote:
> >> > >>>> >
> >> > >>>> > > If the unit cell changes, wouldn't a, b, and c change as well
> >> > since
> >> > >>>> they
> >> > >>>> > > are derived from said unit cell?
> >> > >>>> > >
> >> > >>>> > > As for the speed of response. If it's important to anyone,
> >> they'll
> >> > >>>> > > respond. If not, well they can make their principled stand
> >> against
> >> > >>>> me and
> >> > >>>> > > my impatience later. Otherwise, I acknowledge your point but
> >> also
> >> > my
> >> > >>>> > > okeydokey here, nobody is debugging this thing. It's an easy
> >> fix.
> >> > >>>> And I
> >> > >>>> > > need to do that fix to make it exploit the fixed point local
> >> > >>>> coordinate
> >> > >>>> > > system, but I could leave the seemingly sloppy incorrect
> >> behavior
> >> > >>>> here.
> >> > >>>> > > Driveby contributions to PMEMD like this however are just not
> >> good
> >> > >>>> IMO
> >> > >>>> > and
> >> > >>>> > > why I threw my hands up in frustration after trying to
> refactor
> >> > for
> >> > >>>> > > performance in 2020. It's a Sisyphusian effort and the
> tsunami
> >> of
> >> > >>>> > > unattended warning messages and dead code in the thing alone
> is
> >> > >>>> enough to
> >> > >>>> > > make one run screaming into the night IMO. It was a single
> >> > >>>> unitialized
> >> > >>>> > > variable that broke PMEMD for Silicon Therapeutics in 2020.
> But
> >> > >>>> also, why
> >> > >>>> > > is there is so much dead code?
> >> > >>>> > >
> >> > >>>> > > The author of this code added a new and completely
> superfluous
> >> > >>>> method of
> >> > >>>> > > recalculating a, b,and c without exploiting the properties of
> >> the
> >> > >>>> > > orthorhombic coordinate system to do it more simply as it is
> >> done
> >> > in
> >> > >>>> > > gpu_pressure_scale, which itself does the same thing
> >> internally as
> >> > >>>> > > gpu_update_box if there's an MC barostat. And I hope you can
> >> > >>>> forgive me
> >> > >>>> > for
> >> > >>>> > > being annoyed at the needless repetition of functionality
> here
> >> w/o
> >> > >>>> any
> >> > >>>> > > notes as to why this was necessary.
> >> > >>>> > >
> >> > >>>> > > On Fri, Mar 18, 2022 at 10:22 AM David A Case <
> >> > >>>> david.case.rutgers.edu>
> >> > >>>> > > wrote:
> >> > >>>> > >
> >> > >>>> > >> On Fri, Mar 18, 2022, Scott Le Grand wrote:
> >> > >>>> > >>
> >> > >>>> > >> >Okeydokey, nobody cares. Therefore it's a bug and I'm going
> >> to
> >> > >>>> fix it.
> >> > >>>> > >> Test
> >> > >>>> > >> >will be updated to reflect this. What SM revision should I
> >> use
> >> > >>>> for the
> >> > >>>> > >> test
> >> > >>>> > >> >reference files?
> >> > >>>> > >>
> >> > >>>> > >> ??!? How fast do you expect people to respond? I looked at
> >> this
> >> > >>>> and
> >> > >>>> > >> didn't
> >> > >>>> > >> see any obvious error -- the two routines are called at
> >> different
> >> > >>>> > places:
> >> > >>>> > >> gpu_pressure_scale only when ntp>0 and barostat=1, whereas
> >> > >>>> > gpu_update_box
> >> > >>>> > >> is only called when re-weighting trajectories.
> >> > >>>> > >>
> >> > >>>> > >> But I don't know the code here. Clearly, you must be
> >> convinced
> >> > >>>> that
> >> > >>>> > a,b,c
> >> > >>>> > >> need to be updated when reweight is > 0. I'll try to look
> at
> >> > what
> >> > >>>> > >> behavior
> >> > >>>> > >> changes when you make the change. But I'm travelling to the
> >> ACS
> >> > >>>> meeting
> >> > >>>> > >> tomorrow, so you may disappointed if you require immediate
> >> > >>>> feedback.
> >> > >>>> > >>
> >> > >>>> > >> ....dac
> >> > >>>> > >>
> >> > >>>> > >>
> >> > >>>> > >> _______________________________________________
> >> > >>>> > >> AMBER-Developers mailing list
> >> > >>>> > >> AMBER-Developers.ambermd.org
> >> > >>>> > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> >> > >>>> > >>
> >> > >>>> > >
> >> > >>>> > _______________________________________________
> >> > >>>> > AMBER-Developers mailing list
> >> > >>>> > AMBER-Developers.ambermd.org
> >> > >>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >> > >>>> >
> >> > >>>> _______________________________________________
> >> > >>>> AMBER-Developers mailing list
> >> > >>>> AMBER-Developers.ambermd.org
> >> > >>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> >> > >>>>
> >> > >>>
> >> > _______________________________________________
> >> > AMBER-Developers mailing list
> >> > AMBER-Developers.ambermd.org
> >> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >> >
> >> _______________________________________________
> >> AMBER-Developers mailing list
> >> AMBER-Developers.ambermd.org
> >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> >>
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Mar 19 2022 - 09:30:02 PDT
Custom Search