Re: [AMBER-Developers] gpu_update_box

From: Scott Le Grand <varelse2005.gmail.com>
Date: Fri, 18 Mar 2022 12:58:18 -0700

OK so I think I will just update a, b, and c here for now as this is part
of the fixed point stuff. Then dive into it more rigorously later.

On Fri, Mar 18, 2022 at 12:53 PM David Cerutti <dscerutti.gmail.com> wrote:

> Cut factor is intended to account for that, but from what I recall (and I'm
> struggling to recall things I did 3-4 years ago) it's an imperfect fix.
> The general case is solved by computing the Hessian norms as I'm doing,
> determining those unit cell face-to-face spans, and then determining the
> number of domain subdivisions than will fit. Realize also that, unlike
> pmemd's CPU code, which dynamically reallocates the pair list
> subdivision as the box changes volume, pmemd.cuda is confined to a single
> decomposition and if that dips below the allowable limits then the program
> shuts down.
>
> But this does jog my memory a bit--the problem wasn't with the way you were
> combining the cutoff and unit cell fractional to real space transformation
> matrix per se, but rather the fact that the cutoff plus the pair list
> margin needed to be combined before applying this analysis. I think the
> issue was that, before pmemd.cuda realized that the simulation cell had
> indeed shrunk too much and hit the all-stop and boosted the user's positive
> energy with that error message, it could dip below the safe margin defined
> by something akin to cut_factor. That could leave you in situations where
> the box was really too small for the cutoff + pair list margin but the code
> didn't yet realize it. A more general, precise fix was what I think was
> needed. I just fell back on what I knew--you can find HessianNorms back in
> mdgx. But it may be possible that one can assess the columns as you say,
> so long as it is with the combined cutoff and pair list margin, to
> determine the subdivision count. I know that a bug got fixed by the change
> I submitted, although one would have to delve back into the logs to find
> out exactly what. Being a bug that depends on certain combinations of pair
> list margins and box dimensions, it's very much like the tricky precision
> issue and will be hard to know if things are safe if we simply roll back.
> But follow the reasoning and things should be all right.
>
> Dave
>
>
> On Fri, Mar 18, 2022 at 3:34 PM Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
> > Also, isn't cut_factor (pbc.F90) supposed to account for this? Looks like
> > the ucell matrix remains upper diagonal under all circumstance?
> >
> > 114 if (is_orthog .ne. 0) then
> > 115 ucell(:, :) = 0.d0
> > 116 ucell(1, 1) = a
> > 117 ucell(2, 2) = b
> > 118 ucell(3, 3) = c
> > 119 cut_factor(:) = 1.d0
> > 120 else
> > 121 factor = PI / 180.d0
> > 122 ucell(1, 1) = a
> > 123 ucell(2, 1) = 0.d0
> > 124 ucell(3, 1) = 0.d0
> > 125 ucell(1, 2) = b * cos(factor * gamma)
> > 126 ucell(2, 2) = b * sin(factor * gamma)
> > 127 ucell(3, 2) = 0.d0
> > 128 ucell(1, 3) = c * cos(factor * beta)
> > 129 ucell(2, 3) = (b * c * cos(factor * alpha) - ucell(1, 3) * &
> > 130 ucell(1, 2))/ucell(2, 2)
> > 131 ucell(3, 3) = sqrt(c * c - ucell(1, 3) * ucell(1, 3) -
> ucell(2,
> > 3) * &
> > 132 ucell(2, 3))
> > 133
> > 134 ! Cut factors are used to correct for "spherical cutoff
> > protrusion" into
> > 135 ! adjacent unit cells. The problem is that the point at
> which
> > a cutoff
> > 136 ! sphere is tangent to a unit cell side is not the contact
> > point for
> > 137 ! projection of an orthogonal x, y, or z vector in a
> > nonorthogonal unit
> > 138 ! cell. We thus have to increase the cutoff a bit to allow
> for
> > the longer
> > 139 ! distance for the orthogonal projection.
> > 140
> > 141 cut_factor(1) = 1.d0 / (sin(factor * beta) * sin(factor *
> > gamma))
> > 142 cut_factor(2) = 1.d0 / (sin(factor * alpha) * sin(factor *
> > gamma))
> > 143 cut_factor(3) = 1.d0 / (sin(factor * alpha) * sin(factor *
> > beta))
> > 144 end if
> >
> >
> >
> > On Fri, Mar 18, 2022 at 12:21 PM Scott Le Grand <varelse2005.gmail.com>
> > wrote:
> >
> > > Finally, the fix is surprisingly atomic modulo working the fixed point
> > > math into cellOfffset.
> > >
> > > On Fri, Mar 18, 2022 at 12:20 PM Scott Le Grand <varelse2005.gmail.com
> >
> > > wrote:
> > >
> > >> Also the fix is in(tm), but I need to address gpu_update_box and I
> need
> > >> to know if ucell matrices are still upper diagonal because if not that
> > will
> > >> require some further work as I had to update the cell offsets to be
> > >> precalculated to make fixed point consistent and properly bounded so
> > there
> > >> can't ever be an overflow.
> > >>
> > >>
> > >>
> > >> On Fri, Mar 18, 2022 at 12:17 PM Scott Le Grand <
> varelse2005.gmail.com>
> > >> wrote:
> > >>
> > >>> First up, thanks for replying! Appreciated!
> > >>>
> > >>> But... For triclinic cells, I never saw a ucell matrix full occupied.
> > >>> u(2,1), u(3,1), and u(3,2) are always zero, yes?
> > >>>
> > >>> If so, then the column space defines a, b, and c by the respective
> > >>> column lengths that *are* the distances between those separating
> > planes.
> > >>> Check it out for yourself. Can you point me to a case that violates
> > this?
> > >>> If so, we have to fix the whole shebang, not just this routine.
> > Because I
> > >>> agree with you if we have fully occupied ucell matrices, otherwise
> no.
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Mar 18, 2022 at 12:10 PM David Cerutti <dscerutti.gmail.com>
> > >>> wrote:
> > >>>
> > >>>> What's going on here is that not all unit cells are
> orthorhombic--such
> > >>>> would imply that the box angles alpha, beta, and gamma are always 90
> > >>>> degrees. The general unit cell is triclinic, where alpha, beta, and
> > >>>> gamma
> > >>>> can take on any values (with some degree of excluded territory, as
> > >>>> certain
> > >>>> combinations would fail to make a shape that can exist). The code
> you
> > >>>> are
> > >>>> seeing are seeing was created in response to another aspect of the
> > >>>> original
> > >>>> pmemd.cuda pair list that, if I recall correctly, was missing
> > >>>> interactions
> > >>>> from time to time due to a miscommunication between the way the pair
> > >>>> list
> > >>>> was generated versus the way the box had changed. I don't recall if
> > it
> > >>>> was
> > >>>> creating energy drift or causing an outright crash. It was also
> > around
> > >>>> the
> > >>>> time I added the -AllowSmallBox flag: note that the code will not
> > allow
> > >>>> users to run simulations that are less than three pair list
> > subdivisions
> > >>>> along any dimension without an explicit instruction to do so, as I
> > >>>> wasn't
> > >>>> sure that the way the particles get re-imaged at the beginning of
> each
> > >>>> time
> > >>>> was safe if there are only two subdivisions.
> > >>>>
> > >>>> But the HessianNorms function is calculating the actual thickness of
> > >>>> each
> > >>>> subdomain cell, as defined by the distance separating the planes of
> > >>>> opposing faces. This is integral to the reason that, unless you do
> > some
> > >>>> sophisticated stuff like the honeycomb idea I had, a truncated
> > >>>> octahedron
> > >>>> is not as efficient as the 22% reduction in overall particle count
> > would
> > >>>> imply: the total volume of each cell needs to expand as it tilts,
> and
> > >>>> this
> > >>>> implies more tiles to cover the same volume. It's still
> advantageous
> > to
> > >>>> use a truncated octahedron, as the Hilbert Space curve does pretty
> > well
> > >>>> to
> > >>>> localize what is needed and of course the cutoff shell of any given
> > >>>> particle remains spherical.
> > >>>>
> > >>>> What's the ETA on the energy drift fix? Is this an atomic change,
> or
> > >>>> does
> > >>>> it also come with other refactoring / rollback of various precision
> > >>>> trims?
> > >>>>
> > >>>> Dave
> > >>>>
> > >>>>
> > >>>> On Fri, Mar 18, 2022 at 2:14 PM Scott Le Grand <
> varelse2005.gmail.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>> > Let's look at this, here's the new code:
> > >>>> > HessianNorms(invU, cdepth);
> > >>>> > double skin = cdepth[0]/gpu->sim.xcells - gpu->sim.cut;
> > >>>> > double yskin = cdepth[1]/gpu->sim.ycells - gpu->sim.cut;
> > >>>> > double zskin = cdepth[2]/gpu->sim.zcells - gpu->sim.cut;
> > >>>> >
> > >>>> > The array cdepth is filed with the new a, b, and c by means of an
> > >>>> external
> > >>>> > call. But as far as I know, all unit cells are orthorhombic, so
> > we've
> > >>>> > always been able to use:
> > >>>> > // Recalculate a, b, c
> > >>>> > double a =
> > >>>> > pNTPData->ucell[0];
> > >>>> > double b =
> > >>>> > sqrt(pNTPData->ucell[1] * pNTPData->ucell[1] +
> > >>>> >
> > >>>> > pNTPData->ucell[4] * pNTPData->ucell[4]);
> > >>>> > double c =
> > >>>> > sqrt(pNTPData->ucell[2] * pNTPData->ucell[2] +
> > >>>> >
> > >>>> > pNTPData->ucell[5] * pNTPData->ucell[5] +
> > >>>> >
> > >>>> > pNTPData->ucell[8] * pNTPData->ucell[8]);
> > >>>> >
> > >>>> > because of the constraints that imposes on the column space of the
> > >>>> unit
> > >>>> > cell matrix.
> > >>>> >
> > >>>> > So either the first code clause is overkill and repeated work or
> we
> > >>>> have to
> > >>>> > revisit the previous orthorhombic assumption. Which is it?
> > >>>> >
> > >>>> > But even then, why would one change the unit cell matrix without
> > >>>> updating
> > >>>> > the a, b, and c derived from it? And yes, I'm impatient w/r to
> > fixing
> > >>>> bugs.
> > >>>> > Guilty.
> > >>>> >
> > >>>> >
> > >>>> > On Fri, Mar 18, 2022 at 10:42 AM Scott Le Grand <
> > >>>> varelse2005.gmail.com>
> > >>>> > wrote:
> > >>>> >
> > >>>> > > If the unit cell changes, wouldn't a, b, and c change as well
> > since
> > >>>> they
> > >>>> > > are derived from said unit cell?
> > >>>> > >
> > >>>> > > As for the speed of response. If it's important to anyone,
> they'll
> > >>>> > > respond. If not, well they can make their principled stand
> against
> > >>>> me and
> > >>>> > > my impatience later. Otherwise, I acknowledge your point but
> also
> > my
> > >>>> > > okeydokey here, nobody is debugging this thing. It's an easy
> fix.
> > >>>> And I
> > >>>> > > need to do that fix to make it exploit the fixed point local
> > >>>> coordinate
> > >>>> > > system, but I could leave the seemingly sloppy incorrect
> behavior
> > >>>> here.
> > >>>> > > Driveby contributions to PMEMD like this however are just not
> good
> > >>>> IMO
> > >>>> > and
> > >>>> > > why I threw my hands up in frustration after trying to refactor
> > for
> > >>>> > > performance in 2020. It's a Sisyphusian effort and the tsunami
> of
> > >>>> > > unattended warning messages and dead code in the thing alone is
> > >>>> enough to
> > >>>> > > make one run screaming into the night IMO. It was a single
> > >>>> unitialized
> > >>>> > > variable that broke PMEMD for Silicon Therapeutics in 2020. But
> > >>>> also, why
> > >>>> > > is there is so much dead code?
> > >>>> > >
> > >>>> > > The author of this code added a new and completely superfluous
> > >>>> method of
> > >>>> > > recalculating a, b,and c without exploiting the properties of
> the
> > >>>> > > orthorhombic coordinate system to do it more simply as it is
> done
> > in
> > >>>> > > gpu_pressure_scale, which itself does the same thing internally
> as
> > >>>> > > gpu_update_box if there's an MC barostat. And I hope you can
> > >>>> forgive me
> > >>>> > for
> > >>>> > > being annoyed at the needless repetition of functionality here
> w/o
> > >>>> any
> > >>>> > > notes as to why this was necessary.
> > >>>> > >
> > >>>> > > On Fri, Mar 18, 2022 at 10:22 AM David A Case <
> > >>>> david.case.rutgers.edu>
> > >>>> > > wrote:
> > >>>> > >
> > >>>> > >> On Fri, Mar 18, 2022, Scott Le Grand wrote:
> > >>>> > >>
> > >>>> > >> >Okeydokey, nobody cares. Therefore it's a bug and I'm going to
> > >>>> fix it.
> > >>>> > >> Test
> > >>>> > >> >will be updated to reflect this. What SM revision should I use
> > >>>> for the
> > >>>> > >> test
> > >>>> > >> >reference files?
> > >>>> > >>
> > >>>> > >> ??!? How fast do you expect people to respond? I looked at
> this
> > >>>> and
> > >>>> > >> didn't
> > >>>> > >> see any obvious error -- the two routines are called at
> different
> > >>>> > places:
> > >>>> > >> gpu_pressure_scale only when ntp>0 and barostat=1, whereas
> > >>>> > gpu_update_box
> > >>>> > >> is only called when re-weighting trajectories.
> > >>>> > >>
> > >>>> > >> But I don't know the code here. Clearly, you must be convinced
> > >>>> that
> > >>>> > a,b,c
> > >>>> > >> need to be updated when reweight is > 0. I'll try to look at
> > what
> > >>>> > >> behavior
> > >>>> > >> changes when you make the change. But I'm travelling to the
> ACS
> > >>>> meeting
> > >>>> > >> tomorrow, so you may disappointed if you require immediate
> > >>>> feedback.
> > >>>> > >>
> > >>>> > >> ....dac
> > >>>> > >>
> > >>>> > >>
> > >>>> > >> _______________________________________________
> > >>>> > >> AMBER-Developers mailing list
> > >>>> > >> AMBER-Developers.ambermd.org
> > >>>> > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>>> > >>
> > >>>> > >
> > >>>> > _______________________________________________
> > >>>> > AMBER-Developers mailing list
> > >>>> > AMBER-Developers.ambermd.org
> > >>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>>> >
> > >>>> _______________________________________________
> > >>>> AMBER-Developers mailing list
> > >>>> AMBER-Developers.ambermd.org
> > >>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >>>>
> > >>>
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 18 2022 - 13:00:03 PDT
Custom Search