Re: [AMBER-Developers] gpu_update_box

From: Scott Le Grand <varelse2005.gmail.com>
Date: Fri, 18 Mar 2022 13:09:33 -0700

So... I think recalculating the cut_factors in gpu_pressure_scale addresses
the issue entirely, no? This exploits the upper diagonal nature of ucell.
Nice catch overall.

On Fri, Mar 18, 2022 at 12:58 PM Scott Le Grand <varelse2005.gmail.com>
wrote:

> OK so I think I will just update a, b, and c here for now as this is part
> of the fixed point stuff. Then dive into it more rigorously later.
>
> On Fri, Mar 18, 2022 at 12:53 PM David Cerutti <dscerutti.gmail.com>
> wrote:
>
>> Cut factor is intended to account for that, but from what I recall (and
>> I'm
>> struggling to recall things I did 3-4 years ago) it's an imperfect fix.
>> The general case is solved by computing the Hessian norms as I'm doing,
>> determining those unit cell face-to-face spans, and then determining the
>> number of domain subdivisions than will fit. Realize also that, unlike
>> pmemd's CPU code, which dynamically reallocates the pair list
>> subdivision as the box changes volume, pmemd.cuda is confined to a single
>> decomposition and if that dips below the allowable limits then the program
>> shuts down.
>>
>> But this does jog my memory a bit--the problem wasn't with the way you
>> were
>> combining the cutoff and unit cell fractional to real space transformation
>> matrix per se, but rather the fact that the cutoff plus the pair list
>> margin needed to be combined before applying this analysis. I think the
>> issue was that, before pmemd.cuda realized that the simulation cell had
>> indeed shrunk too much and hit the all-stop and boosted the user's
>> positive
>> energy with that error message, it could dip below the safe margin defined
>> by something akin to cut_factor. That could leave you in situations where
>> the box was really too small for the cutoff + pair list margin but the
>> code
>> didn't yet realize it. A more general, precise fix was what I think was
>> needed. I just fell back on what I knew--you can find HessianNorms back
>> in
>> mdgx. But it may be possible that one can assess the columns as you say,
>> so long as it is with the combined cutoff and pair list margin, to
>> determine the subdivision count. I know that a bug got fixed by the
>> change
>> I submitted, although one would have to delve back into the logs to find
>> out exactly what. Being a bug that depends on certain combinations of
>> pair
>> list margins and box dimensions, it's very much like the tricky precision
>> issue and will be hard to know if things are safe if we simply roll back.
>> But follow the reasoning and things should be all right.
>>
>> Dave
>>
>>
>> On Fri, Mar 18, 2022 at 3:34 PM Scott Le Grand <varelse2005.gmail.com>
>> wrote:
>>
>> > Also, isn't cut_factor (pbc.F90) supposed to account for this? Looks
>> like
>> > the ucell matrix remains upper diagonal under all circumstance?
>> >
>> > 114 if (is_orthog .ne. 0) then
>> > 115 ucell(:, :) = 0.d0
>> > 116 ucell(1, 1) = a
>> > 117 ucell(2, 2) = b
>> > 118 ucell(3, 3) = c
>> > 119 cut_factor(:) = 1.d0
>> > 120 else
>> > 121 factor = PI / 180.d0
>> > 122 ucell(1, 1) = a
>> > 123 ucell(2, 1) = 0.d0
>> > 124 ucell(3, 1) = 0.d0
>> > 125 ucell(1, 2) = b * cos(factor * gamma)
>> > 126 ucell(2, 2) = b * sin(factor * gamma)
>> > 127 ucell(3, 2) = 0.d0
>> > 128 ucell(1, 3) = c * cos(factor * beta)
>> > 129 ucell(2, 3) = (b * c * cos(factor * alpha) - ucell(1, 3) *
>> &
>> > 130 ucell(1, 2))/ucell(2, 2)
>> > 131 ucell(3, 3) = sqrt(c * c - ucell(1, 3) * ucell(1, 3) -
>> ucell(2,
>> > 3) * &
>> > 132 ucell(2, 3))
>> > 133
>> > 134 ! Cut factors are used to correct for "spherical cutoff
>> > protrusion" into
>> > 135 ! adjacent unit cells. The problem is that the point at
>> which
>> > a cutoff
>> > 136 ! sphere is tangent to a unit cell side is not the contact
>> > point for
>> > 137 ! projection of an orthogonal x, y, or z vector in a
>> > nonorthogonal unit
>> > 138 ! cell. We thus have to increase the cutoff a bit to allow
>> for
>> > the longer
>> > 139 ! distance for the orthogonal projection.
>> > 140
>> > 141 cut_factor(1) = 1.d0 / (sin(factor * beta) * sin(factor *
>> > gamma))
>> > 142 cut_factor(2) = 1.d0 / (sin(factor * alpha) * sin(factor *
>> > gamma))
>> > 143 cut_factor(3) = 1.d0 / (sin(factor * alpha) * sin(factor *
>> > beta))
>> > 144 end if
>> >
>> >
>> >
>> > On Fri, Mar 18, 2022 at 12:21 PM Scott Le Grand <varelse2005.gmail.com>
>> > wrote:
>> >
>> > > Finally, the fix is surprisingly atomic modulo working the fixed point
>> > > math into cellOfffset.
>> > >
>> > > On Fri, Mar 18, 2022 at 12:20 PM Scott Le Grand <
>> varelse2005.gmail.com>
>> > > wrote:
>> > >
>> > >> Also the fix is in(tm), but I need to address gpu_update_box and I
>> need
>> > >> to know if ucell matrices are still upper diagonal because if not
>> that
>> > will
>> > >> require some further work as I had to update the cell offsets to be
>> > >> precalculated to make fixed point consistent and properly bounded so
>> > there
>> > >> can't ever be an overflow.
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Mar 18, 2022 at 12:17 PM Scott Le Grand <
>> varelse2005.gmail.com>
>> > >> wrote:
>> > >>
>> > >>> First up, thanks for replying! Appreciated!
>> > >>>
>> > >>> But... For triclinic cells, I never saw a ucell matrix full
>> occupied.
>> > >>> u(2,1), u(3,1), and u(3,2) are always zero, yes?
>> > >>>
>> > >>> If so, then the column space defines a, b, and c by the respective
>> > >>> column lengths that *are* the distances between those separating
>> > planes.
>> > >>> Check it out for yourself. Can you point me to a case that violates
>> > this?
>> > >>> If so, we have to fix the whole shebang, not just this routine.
>> > Because I
>> > >>> agree with you if we have fully occupied ucell matrices, otherwise
>> no.
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Fri, Mar 18, 2022 at 12:10 PM David Cerutti <dscerutti.gmail.com
>> >
>> > >>> wrote:
>> > >>>
>> > >>>> What's going on here is that not all unit cells are
>> orthorhombic--such
>> > >>>> would imply that the box angles alpha, beta, and gamma are always
>> 90
>> > >>>> degrees. The general unit cell is triclinic, where alpha, beta,
>> and
>> > >>>> gamma
>> > >>>> can take on any values (with some degree of excluded territory, as
>> > >>>> certain
>> > >>>> combinations would fail to make a shape that can exist). The code
>> you
>> > >>>> are
>> > >>>> seeing are seeing was created in response to another aspect of the
>> > >>>> original
>> > >>>> pmemd.cuda pair list that, if I recall correctly, was missing
>> > >>>> interactions
>> > >>>> from time to time due to a miscommunication between the way the
>> pair
>> > >>>> list
>> > >>>> was generated versus the way the box had changed. I don't recall
>> if
>> > it
>> > >>>> was
>> > >>>> creating energy drift or causing an outright crash. It was also
>> > around
>> > >>>> the
>> > >>>> time I added the -AllowSmallBox flag: note that the code will not
>> > allow
>> > >>>> users to run simulations that are less than three pair list
>> > subdivisions
>> > >>>> along any dimension without an explicit instruction to do so, as I
>> > >>>> wasn't
>> > >>>> sure that the way the particles get re-imaged at the beginning of
>> each
>> > >>>> time
>> > >>>> was safe if there are only two subdivisions.
>> > >>>>
>> > >>>> But the HessianNorms function is calculating the actual thickness
>> of
>> > >>>> each
>> > >>>> subdomain cell, as defined by the distance separating the planes of
>> > >>>> opposing faces. This is integral to the reason that, unless you do
>> > some
>> > >>>> sophisticated stuff like the honeycomb idea I had, a truncated
>> > >>>> octahedron
>> > >>>> is not as efficient as the 22% reduction in overall particle count
>> > would
>> > >>>> imply: the total volume of each cell needs to expand as it tilts,
>> and
>> > >>>> this
>> > >>>> implies more tiles to cover the same volume. It's still
>> advantageous
>> > to
>> > >>>> use a truncated octahedron, as the Hilbert Space curve does pretty
>> > well
>> > >>>> to
>> > >>>> localize what is needed and of course the cutoff shell of any given
>> > >>>> particle remains spherical.
>> > >>>>
>> > >>>> What's the ETA on the energy drift fix? Is this an atomic change,
>> or
>> > >>>> does
>> > >>>> it also come with other refactoring / rollback of various precision
>> > >>>> trims?
>> > >>>>
>> > >>>> Dave
>> > >>>>
>> > >>>>
>> > >>>> On Fri, Mar 18, 2022 at 2:14 PM Scott Le Grand <
>> varelse2005.gmail.com
>> > >
>> > >>>> wrote:
>> > >>>>
>> > >>>> > Let's look at this, here's the new code:
>> > >>>> > HessianNorms(invU, cdepth);
>> > >>>> > double skin = cdepth[0]/gpu->sim.xcells - gpu->sim.cut;
>> > >>>> > double yskin = cdepth[1]/gpu->sim.ycells - gpu->sim.cut;
>> > >>>> > double zskin = cdepth[2]/gpu->sim.zcells - gpu->sim.cut;
>> > >>>> >
>> > >>>> > The array cdepth is filed with the new a, b, and c by means of an
>> > >>>> external
>> > >>>> > call. But as far as I know, all unit cells are orthorhombic, so
>> > we've
>> > >>>> > always been able to use:
>> > >>>> > // Recalculate a, b, c
>> > >>>> > double a =
>> > >>>> > pNTPData->ucell[0];
>> > >>>> > double b =
>> > >>>> > sqrt(pNTPData->ucell[1] * pNTPData->ucell[1] +
>> > >>>> >
>> > >>>> > pNTPData->ucell[4] * pNTPData->ucell[4]);
>> > >>>> > double c =
>> > >>>> > sqrt(pNTPData->ucell[2] * pNTPData->ucell[2] +
>> > >>>> >
>> > >>>> > pNTPData->ucell[5] * pNTPData->ucell[5] +
>> > >>>> >
>> > >>>> > pNTPData->ucell[8] * pNTPData->ucell[8]);
>> > >>>> >
>> > >>>> > because of the constraints that imposes on the column space of
>> the
>> > >>>> unit
>> > >>>> > cell matrix.
>> > >>>> >
>> > >>>> > So either the first code clause is overkill and repeated work or
>> we
>> > >>>> have to
>> > >>>> > revisit the previous orthorhombic assumption. Which is it?
>> > >>>> >
>> > >>>> > But even then, why would one change the unit cell matrix without
>> > >>>> updating
>> > >>>> > the a, b, and c derived from it? And yes, I'm impatient w/r to
>> > fixing
>> > >>>> bugs.
>> > >>>> > Guilty.
>> > >>>> >
>> > >>>> >
>> > >>>> > On Fri, Mar 18, 2022 at 10:42 AM Scott Le Grand <
>> > >>>> varelse2005.gmail.com>
>> > >>>> > wrote:
>> > >>>> >
>> > >>>> > > If the unit cell changes, wouldn't a, b, and c change as well
>> > since
>> > >>>> they
>> > >>>> > > are derived from said unit cell?
>> > >>>> > >
>> > >>>> > > As for the speed of response. If it's important to anyone,
>> they'll
>> > >>>> > > respond. If not, well they can make their principled stand
>> against
>> > >>>> me and
>> > >>>> > > my impatience later. Otherwise, I acknowledge your point but
>> also
>> > my
>> > >>>> > > okeydokey here, nobody is debugging this thing. It's an easy
>> fix.
>> > >>>> And I
>> > >>>> > > need to do that fix to make it exploit the fixed point local
>> > >>>> coordinate
>> > >>>> > > system, but I could leave the seemingly sloppy incorrect
>> behavior
>> > >>>> here.
>> > >>>> > > Driveby contributions to PMEMD like this however are just not
>> good
>> > >>>> IMO
>> > >>>> > and
>> > >>>> > > why I threw my hands up in frustration after trying to refactor
>> > for
>> > >>>> > > performance in 2020. It's a Sisyphusian effort and the tsunami
>> of
>> > >>>> > > unattended warning messages and dead code in the thing alone is
>> > >>>> enough to
>> > >>>> > > make one run screaming into the night IMO. It was a single
>> > >>>> unitialized
>> > >>>> > > variable that broke PMEMD for Silicon Therapeutics in 2020. But
>> > >>>> also, why
>> > >>>> > > is there is so much dead code?
>> > >>>> > >
>> > >>>> > > The author of this code added a new and completely superfluous
>> > >>>> method of
>> > >>>> > > recalculating a, b,and c without exploiting the properties of
>> the
>> > >>>> > > orthorhombic coordinate system to do it more simply as it is
>> done
>> > in
>> > >>>> > > gpu_pressure_scale, which itself does the same thing
>> internally as
>> > >>>> > > gpu_update_box if there's an MC barostat. And I hope you can
>> > >>>> forgive me
>> > >>>> > for
>> > >>>> > > being annoyed at the needless repetition of functionality here
>> w/o
>> > >>>> any
>> > >>>> > > notes as to why this was necessary.
>> > >>>> > >
>> > >>>> > > On Fri, Mar 18, 2022 at 10:22 AM David A Case <
>> > >>>> david.case.rutgers.edu>
>> > >>>> > > wrote:
>> > >>>> > >
>> > >>>> > >> On Fri, Mar 18, 2022, Scott Le Grand wrote:
>> > >>>> > >>
>> > >>>> > >> >Okeydokey, nobody cares. Therefore it's a bug and I'm going
>> to
>> > >>>> fix it.
>> > >>>> > >> Test
>> > >>>> > >> >will be updated to reflect this. What SM revision should I
>> use
>> > >>>> for the
>> > >>>> > >> test
>> > >>>> > >> >reference files?
>> > >>>> > >>
>> > >>>> > >> ??!? How fast do you expect people to respond? I looked at
>> this
>> > >>>> and
>> > >>>> > >> didn't
>> > >>>> > >> see any obvious error -- the two routines are called at
>> different
>> > >>>> > places:
>> > >>>> > >> gpu_pressure_scale only when ntp>0 and barostat=1, whereas
>> > >>>> > gpu_update_box
>> > >>>> > >> is only called when re-weighting trajectories.
>> > >>>> > >>
>> > >>>> > >> But I don't know the code here. Clearly, you must be
>> convinced
>> > >>>> that
>> > >>>> > a,b,c
>> > >>>> > >> need to be updated when reweight is > 0. I'll try to look at
>> > what
>> > >>>> > >> behavior
>> > >>>> > >> changes when you make the change. But I'm travelling to the
>> ACS
>> > >>>> meeting
>> > >>>> > >> tomorrow, so you may disappointed if you require immediate
>> > >>>> feedback.
>> > >>>> > >>
>> > >>>> > >> ....dac
>> > >>>> > >>
>> > >>>> > >>
>> > >>>> > >> _______________________________________________
>> > >>>> > >> AMBER-Developers mailing list
>> > >>>> > >> AMBER-Developers.ambermd.org
>> > >>>> > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>>> > >>
>> > >>>> > >
>> > >>>> > _______________________________________________
>> > >>>> > AMBER-Developers mailing list
>> > >>>> > AMBER-Developers.ambermd.org
>> > >>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>>> >
>> > >>>> _______________________________________________
>> > >>>> AMBER-Developers mailing list
>> > >>>> AMBER-Developers.ambermd.org
>> > >>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>> > >>>>
>> > >>>
>> > _______________________________________________
>> > AMBER-Developers mailing list
>> > AMBER-Developers.ambermd.org
>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>> >
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Mar 18 2022 - 13:30:02 PDT
Custom Search