Re: [AMBER-Developers] gpu_update_box from David Cerutti on 2022-03-18 (Amber Developers Archive Mar 2022)

From: David Cerutti <dscerutti.gmail.com>
Date: Fri, 18 Mar 2022 15:52:42 -0400

Cut factor is intended to account for that, but from what I recall (and I'm
struggling to recall things I did 3-4 years ago) it's an imperfect fix.
The general case is solved by computing the Hessian norms as I'm doing,
determining those unit cell face-to-face spans, and then determining the
number of domain subdivisions than will fit. Realize also that, unlike
pmemd's CPU code, which dynamically reallocates the pair list
subdivision as the box changes volume, pmemd.cuda is confined to a single
decomposition and if that dips below the allowable limits then the program
shuts down.

But this does jog my memory a bit--the problem wasn't with the way you were
combining the cutoff and unit cell fractional to real space transformation
matrix per se, but rather the fact that the cutoff plus the pair list
margin needed to be combined before applying this analysis. I think the
issue was that, before pmemd.cuda realized that the simulation cell had
indeed shrunk too much and hit the all-stop and boosted the user's positive
energy with that error message, it could dip below the safe margin defined
by something akin to cut_factor. That could leave you in situations where
the box was really too small for the cutoff + pair list margin but the code
didn't yet realize it. A more general, precise fix was what I think was
needed. I just fell back on what I knew--you can find HessianNorms back in
mdgx. But it may be possible that one can assess the columns as you say,
so long as it is with the combined cutoff and pair list margin, to
determine the subdivision count. I know that a bug got fixed by the change
I submitted, although one would have to delve back into the logs to find
out exactly what. Being a bug that depends on certain combinations of pair
list margins and box dimensions, it's very much like the tricky precision
issue and will be hard to know if things are safe if we simply roll back.
But follow the reasoning and things should be all right.

Dave

On Fri, Mar 18, 2022 at 3:34 PM Scott Le Grand <varelse2005.gmail.com>
wrote:

> Also, isn't cut_factor (pbc.F90) supposed to account for this? Looks like
> the ucell matrix remains upper diagonal under all circumstance?
>
> 114 if (is_orthog .ne. 0) then
> 115 ucell(:, :) = 0.d0
> 116 ucell(1, 1) = a
> 117 ucell(2, 2) = b
> 118 ucell(3, 3) = c
> 119 cut_factor(:) = 1.d0
> 120 else
> 121 factor = PI / 180.d0
> 122 ucell(1, 1) = a
> 123 ucell(2, 1) = 0.d0
> 124 ucell(3, 1) = 0.d0
> 125 ucell(1, 2) = b * cos(factor * gamma)
> 126 ucell(2, 2) = b * sin(factor * gamma)
> 127 ucell(3, 2) = 0.d0
> 128 ucell(1, 3) = c * cos(factor * beta)
> 129 ucell(2, 3) = (b * c * cos(factor * alpha) - ucell(1, 3) * &
> 130 ucell(1, 2))/ucell(2, 2)
> 131 ucell(3, 3) = sqrt(c * c - ucell(1, 3) * ucell(1, 3) - ucell(2,
> 3) * &
> 132 ucell(2, 3))
> 133
> 134 ! Cut factors are used to correct for "spherical cutoff
> protrusion" into
> 135 ! adjacent unit cells. The problem is that the point at which
> a cutoff
> 136 ! sphere is tangent to a unit cell side is not the contact
> point for
> 137 ! projection of an orthogonal x, y, or z vector in a
> nonorthogonal unit
> 138 ! cell. We thus have to increase the cutoff a bit to allow for
> the longer
> 139 ! distance for the orthogonal projection.
> 140
> 141 cut_factor(1) = 1.d0 / (sin(factor * beta) * sin(factor *
> gamma))
> 142 cut_factor(2) = 1.d0 / (sin(factor * alpha) * sin(factor *
> gamma))
> 143 cut_factor(3) = 1.d0 / (sin(factor * alpha) * sin(factor *
> beta))
> 144 end if
>
>
>
> On Fri, Mar 18, 2022 at 12:21 PM Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
> > Finally, the fix is surprisingly atomic modulo working the fixed point
> > math into cellOfffset.
> >
> > On Fri, Mar 18, 2022 at 12:20 PM Scott Le Grand <varelse2005.gmail.com>
> > wrote:
> >
> >> Also the fix is in(tm), but I need to address gpu_update_box and I need
> >> to know if ucell matrices are still upper diagonal because if not that
> will
> >> require some further work as I had to update the cell offsets to be
> >> precalculated to make fixed point consistent and properly bounded so
> there
> >> can't ever be an overflow.
> >>
> >>
> >>
> >> On Fri, Mar 18, 2022 at 12:17 PM Scott Le Grand <varelse2005.gmail.com>
> >> wrote:
> >>
> >>> First up, thanks for replying! Appreciated!
> >>>
> >>> But... For triclinic cells, I never saw a ucell matrix full occupied.
> >>> u(2,1), u(3,1), and u(3,2) are always zero, yes?
> >>>
> >>> If so, then the column space defines a, b, and c by the respective
> >>> column lengths that *are* the distances between those separating
> planes.
> >>> Check it out for yourself. Can you point me to a case that violates
> this?
> >>> If so, we have to fix the whole shebang, not just this routine.
> Because I
> >>> agree with you if we have fully occupied ucell matrices, otherwise no.
> >>>
> >>>
> >>>
> >>> On Fri, Mar 18, 2022 at 12:10 PM David Cerutti <dscerutti.gmail.com>
> >>> wrote:
> >>>
> >>>> What's going on here is that not all unit cells are orthorhombic--such
> >>>> would imply that the box angles alpha, beta, and gamma are always 90
> >>>> degrees. The general unit cell is triclinic, where alpha, beta, and
> >>>> gamma
> >>>> can take on any values (with some degree of excluded territory, as
> >>>> certain
> >>>> combinations would fail to make a shape that can exist). The code you
> >>>> are
> >>>> seeing are seeing was created in response to another aspect of the
> >>>> original
> >>>> pmemd.cuda pair list that, if I recall correctly, was missing
> >>>> interactions
> >>>> from time to time due to a miscommunication between the way the pair
> >>>> list
> >>>> was generated versus the way the box had changed. I don't recall if
> it
> >>>> was
> >>>> creating energy drift or causing an outright crash. It was also
> around
> >>>> the
> >>>> time I added the -AllowSmallBox flag: note that the code will not
> allow
> >>>> users to run simulations that are less than three pair list
> subdivisions
> >>>> along any dimension without an explicit instruction to do so, as I
> >>>> wasn't
> >>>> sure that the way the particles get re-imaged at the beginning of each
> >>>> time
> >>>> was safe if there are only two subdivisions.
> >>>>
> >>>> But the HessianNorms function is calculating the actual thickness of
> >>>> each
> >>>> subdomain cell, as defined by the distance separating the planes of
> >>>> opposing faces. This is integral to the reason that, unless you do
> some
> >>>> sophisticated stuff like the honeycomb idea I had, a truncated
> >>>> octahedron
> >>>> is not as efficient as the 22% reduction in overall particle count
> would
> >>>> imply: the total volume of each cell needs to expand as it tilts, and
> >>>> this
> >>>> implies more tiles to cover the same volume. It's still advantageous
> to
> >>>> use a truncated octahedron, as the Hilbert Space curve does pretty
> well
> >>>> to
> >>>> localize what is needed and of course the cutoff shell of any given
> >>>> particle remains spherical.
> >>>>
> >>>> What's the ETA on the energy drift fix? Is this an atomic change, or
> >>>> does
> >>>> it also come with other refactoring / rollback of various precision
> >>>> trims?
> >>>>
> >>>> Dave
> >>>>
> >>>>
> >>>> On Fri, Mar 18, 2022 at 2:14 PM Scott Le Grand <varelse2005.gmail.com
> >
> >>>> wrote:
> >>>>
> >>>> > Let's look at this, here's the new code:
> >>>> > HessianNorms(invU, cdepth);
> >>>> > double skin = cdepth[0]/gpu->sim.xcells - gpu->sim.cut;
> >>>> > double yskin = cdepth[1]/gpu->sim.ycells - gpu->sim.cut;
> >>>> > double zskin = cdepth[2]/gpu->sim.zcells - gpu->sim.cut;
> >>>> >
> >>>> > The array cdepth is filed with the new a, b, and c by means of an
> >>>> external
> >>>> > call. But as far as I know, all unit cells are orthorhombic, so
> we've
> >>>> > always been able to use:
> >>>> > // Recalculate a, b, c
> >>>> > double a =
> >>>> > pNTPData->ucell[0];
> >>>> > double b =
> >>>> > sqrt(pNTPData->ucell[1] * pNTPData->ucell[1] +
> >>>> >
> >>>> > pNTPData->ucell[4] * pNTPData->ucell[4]);
> >>>> > double c =
> >>>> > sqrt(pNTPData->ucell[2] * pNTPData->ucell[2] +
> >>>> >
> >>>> > pNTPData->ucell[5] * pNTPData->ucell[5] +
> >>>> >
> >>>> > pNTPData->ucell[8] * pNTPData->ucell[8]);
> >>>> >
> >>>> > because of the constraints that imposes on the column space of the
> >>>> unit
> >>>> > cell matrix.
> >>>> >
> >>>> > So either the first code clause is overkill and repeated work or we
> >>>> have to
> >>>> > revisit the previous orthorhombic assumption. Which is it?
> >>>> >
> >>>> > But even then, why would one change the unit cell matrix without
> >>>> updating
> >>>> > the a, b, and c derived from it? And yes, I'm impatient w/r to
> fixing
> >>>> bugs.
> >>>> > Guilty.
> >>>> >
> >>>> >
> >>>> > On Fri, Mar 18, 2022 at 10:42 AM Scott Le Grand <
> >>>> varelse2005.gmail.com>
> >>>> > wrote:
> >>>> >
> >>>> > > If the unit cell changes, wouldn't a, b, and c change as well
> since
> >>>> they
> >>>> > > are derived from said unit cell?
> >>>> > >
> >>>> > > As for the speed of response. If it's important to anyone, they'll
> >>>> > > respond. If not, well they can make their principled stand against
> >>>> me and
> >>>> > > my impatience later. Otherwise, I acknowledge your point but also
> my
> >>>> > > okeydokey here, nobody is debugging this thing. It's an easy fix.
> >>>> And I
> >>>> > > need to do that fix to make it exploit the fixed point local
> >>>> coordinate
> >>>> > > system, but I could leave the seemingly sloppy incorrect behavior
> >>>> here.
> >>>> > > Driveby contributions to PMEMD like this however are just not good
> >>>> IMO
> >>>> > and
> >>>> > > why I threw my hands up in frustration after trying to refactor
> for
> >>>> > > performance in 2020. It's a Sisyphusian effort and the tsunami of
> >>>> > > unattended warning messages and dead code in the thing alone is
> >>>> enough to
> >>>> > > make one run screaming into the night IMO. It was a single
> >>>> unitialized
> >>>> > > variable that broke PMEMD for Silicon Therapeutics in 2020. But
> >>>> also, why
> >>>> > > is there is so much dead code?
> >>>> > >
> >>>> > > The author of this code added a new and completely superfluous
> >>>> method of
> >>>> > > recalculating a, b,and c without exploiting the properties of the
> >>>> > > orthorhombic coordinate system to do it more simply as it is done
> in
> >>>> > > gpu_pressure_scale, which itself does the same thing internally as
> >>>> > > gpu_update_box if there's an MC barostat. And I hope you can
> >>>> forgive me
> >>>> > for
> >>>> > > being annoyed at the needless repetition of functionality here w/o
> >>>> any
> >>>> > > notes as to why this was necessary.
> >>>> > >
> >>>> > > On Fri, Mar 18, 2022 at 10:22 AM David A Case <
> >>>> david.case.rutgers.edu>
> >>>> > > wrote:
> >>>> > >
> >>>> > >> On Fri, Mar 18, 2022, Scott Le Grand wrote:
> >>>> > >>
> >>>> > >> >Okeydokey, nobody cares. Therefore it's a bug and I'm going to
> >>>> fix it.
> >>>> > >> Test
> >>>> > >> >will be updated to reflect this. What SM revision should I use
> >>>> for the
> >>>> > >> test
> >>>> > >> >reference files?
> >>>> > >>
> >>>> > >> ??!? How fast do you expect people to respond? I looked at this
> >>>> and
> >>>> > >> didn't
> >>>> > >> see any obvious error -- the two routines are called at different
> >>>> > places:
> >>>> > >> gpu_pressure_scale only when ntp>0 and barostat=1, whereas
> >>>> > gpu_update_box
> >>>> > >> is only called when re-weighting trajectories.
> >>>> > >>
> >>>> > >> But I don't know the code here. Clearly, you must be convinced
> >>>> that
> >>>> > a,b,c
> >>>> > >> need to be updated when reweight is > 0. I'll try to look at
> what
> >>>> > >> behavior
> >>>> > >> changes when you make the change. But I'm travelling to the ACS
> >>>> meeting
> >>>> > >> tomorrow, so you may disappointed if you require immediate
> >>>> feedback.
> >>>> > >>
> >>>> > >> ....dac
> >>>> > >>
> >>>> > >>
> >>>> > >> _______________________________________________
> >>>> > >> AMBER-Developers mailing list
> >>>> > >> AMBER-Developers.ambermd.org
> >>>> > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> >>>> > >>
> >>>> > >
> >>>> > _______________________________________________
> >>>> > AMBER-Developers mailing list
> >>>> > AMBER-Developers.ambermd.org
> >>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >>>> >
> >>>> _______________________________________________
> >>>> AMBER-Developers mailing list
> >>>> AMBER-Developers.ambermd.org
> >>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> >>>>
> >>>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 18 2022 - 13:00:03 PDT