Re: [AMBER-Developers] gpu_update_box

From: David Cerutti <dscerutti.gmail.com>
Date: Sat, 19 Mar 2022 13:15:34 -0400

The unit cell angles do not change in our simulations--they could, in
principle, we just don't have the code to make that happen. I think what
was happening is that the cut factors were just not being brought to bear.
But the entire implementation looks a bit hackey to me--see the additional
factors of 1.0d in the cit.F90 code when the cut_factor three-element array
is used to update the number of subdivisions. It smells like Bob was
working with some second-hand math advice and wasn't sure if it was going
to be conservative enough, so he was preparing to improvise.

Dave


On Sat, Mar 19, 2022 at 12:44 PM Scott Le Grand <varelse2005.gmail.com>
wrote:

> So I *think* what happened here is while cut_factor is a function of the
> unit cell angles, if those unit cell angles change, it's no longer valid.
> Was this happening in NTP and the repartitioning kept the skin from getting
> too small?
>
> Still think we need to update a, b, and c. They should never go stale.
>
> On Sat, Mar 19, 2022 at 9:19 AM David Cerutti <dscerutti.gmail.com> wrote:
>
> > So I gave some more thought to your problem, and I think you need to try
> > the following to ensure that your method is safe. Consider a truncated
> > octahedral unit cell, which is a very symmetric solid: a = b = c = L,
> alpha
> > = beta = gamma = 109.4712206 degrees (the tetrahedral angle). The number
> > of subdivisions of this solid should be the same in all directions. If
> you
> > use the method above, you will get the same cut_factor along all three
> > dimensions, which may or may not be what you want. Based on what I see
> in
> > cit.F90, this seems to be what happens. Again, I know the Hessian norms
> > approach, not sure whether scaling the minimum subdivision cell length by
> > the inverse sines of the other two angles will be a perfect solution but
> it
> > certainly pushes the result in the right direction, and if it turns out
> to
> > be overkill it won't be by much. As of 2016, cut_factor was copied over
> to
> > the simulationConst global object but unused. If you can bring it to
> bear,
> > you will sweep up the vast, vast majority and probably all of a problem
> > that occasionally became apparent.
> >
> > Dave
> >
> >
> > On Fri, Mar 18, 2022 at 4:09 PM Scott Le Grand <varelse2005.gmail.com>
> > wrote:
> >
> > > So... I think recalculating the cut_factors in gpu_pressure_scale
> > addresses
> > > the issue entirely, no? This exploits the upper diagonal nature of
> ucell.
> > > Nice catch overall.
> > >
> > > On Fri, Mar 18, 2022 at 12:58 PM Scott Le Grand <varelse2005.gmail.com
> >
> > > wrote:
> > >
> > > > OK so I think I will just update a, b, and c here for now as this is
> > part
> > > > of the fixed point stuff. Then dive into it more rigorously later.
> > > >
> > > > On Fri, Mar 18, 2022 at 12:53 PM David Cerutti <dscerutti.gmail.com>
> > > > wrote:
> > > >
> > > >> Cut factor is intended to account for that, but from what I recall
> > (and
> > > >> I'm
> > > >> struggling to recall things I did 3-4 years ago) it's an imperfect
> > fix.
> > > >> The general case is solved by computing the Hessian norms as I'm
> > doing,
> > > >> determining those unit cell face-to-face spans, and then determining
> > the
> > > >> number of domain subdivisions than will fit. Realize also that,
> > unlike
> > > >> pmemd's CPU code, which dynamically reallocates the pair list
> > > >> subdivision as the box changes volume, pmemd.cuda is confined to a
> > > single
> > > >> decomposition and if that dips below the allowable limits then the
> > > program
> > > >> shuts down.
> > > >>
> > > >> But this does jog my memory a bit--the problem wasn't with the way
> you
> > > >> were
> > > >> combining the cutoff and unit cell fractional to real space
> > > transformation
> > > >> matrix per se, but rather the fact that the cutoff plus the pair
> list
> > > >> margin needed to be combined before applying this analysis. I think
> > the
> > > >> issue was that, before pmemd.cuda realized that the simulation cell
> > had
> > > >> indeed shrunk too much and hit the all-stop and boosted the user's
> > > >> positive
> > > >> energy with that error message, it could dip below the safe margin
> > > defined
> > > >> by something akin to cut_factor. That could leave you in situations
> > > where
> > > >> the box was really too small for the cutoff + pair list margin but
> the
> > > >> code
> > > >> didn't yet realize it. A more general, precise fix was what I think
> > was
> > > >> needed. I just fell back on what I knew--you can find HessianNorms
> > back
> > > >> in
> > > >> mdgx. But it may be possible that one can assess the columns as you
> > > say,
> > > >> so long as it is with the combined cutoff and pair list margin, to
> > > >> determine the subdivision count. I know that a bug got fixed by the
> > > >> change
> > > >> I submitted, although one would have to delve back into the logs to
> > find
> > > >> out exactly what. Being a bug that depends on certain combinations
> of
> > > >> pair
> > > >> list margins and box dimensions, it's very much like the tricky
> > > precision
> > > >> issue and will be hard to know if things are safe if we simply roll
> > > back.
> > > >> But follow the reasoning and things should be all right.
> > > >>
> > > >> Dave
> > > >>
> > > >>
> > > >> On Fri, Mar 18, 2022 at 3:34 PM Scott Le Grand <
> varelse2005.gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Also, isn't cut_factor (pbc.F90) supposed to account for this?
> Looks
> > > >> like
> > > >> > the ucell matrix remains upper diagonal under all circumstance?
> > > >> >
> > > >> > 114 if (is_orthog .ne. 0) then
> > > >> > 115 ucell(:, :) = 0.d0
> > > >> > 116 ucell(1, 1) = a
> > > >> > 117 ucell(2, 2) = b
> > > >> > 118 ucell(3, 3) = c
> > > >> > 119 cut_factor(:) = 1.d0
> > > >> > 120 else
> > > >> > 121 factor = PI / 180.d0
> > > >> > 122 ucell(1, 1) = a
> > > >> > 123 ucell(2, 1) = 0.d0
> > > >> > 124 ucell(3, 1) = 0.d0
> > > >> > 125 ucell(1, 2) = b * cos(factor * gamma)
> > > >> > 126 ucell(2, 2) = b * sin(factor * gamma)
> > > >> > 127 ucell(3, 2) = 0.d0
> > > >> > 128 ucell(1, 3) = c * cos(factor * beta)
> > > >> > 129 ucell(2, 3) = (b * c * cos(factor * alpha) - ucell(1,
> > 3)
> > > *
> > > >> &
> > > >> > 130 ucell(1, 2))/ucell(2, 2)
> > > >> > 131 ucell(3, 3) = sqrt(c * c - ucell(1, 3) * ucell(1, 3) -
> > > >> ucell(2,
> > > >> > 3) * &
> > > >> > 132 ucell(2, 3))
> > > >> > 133
> > > >> > 134 ! Cut factors are used to correct for "spherical
> cutoff
> > > >> > protrusion" into
> > > >> > 135 ! adjacent unit cells. The problem is that the point
> at
> > > >> which
> > > >> > a cutoff
> > > >> > 136 ! sphere is tangent to a unit cell side is not the
> > contact
> > > >> > point for
> > > >> > 137 ! projection of an orthogonal x, y, or z vector in a
> > > >> > nonorthogonal unit
> > > >> > 138 ! cell. We thus have to increase the cutoff a bit to
> > > allow
> > > >> for
> > > >> > the longer
> > > >> > 139 ! distance for the orthogonal projection.
> > > >> > 140
> > > >> > 141 cut_factor(1) = 1.d0 / (sin(factor * beta) *
> sin(factor
> > *
> > > >> > gamma))
> > > >> > 142 cut_factor(2) = 1.d0 / (sin(factor * alpha) *
> > sin(factor *
> > > >> > gamma))
> > > >> > 143 cut_factor(3) = 1.d0 / (sin(factor * alpha) *
> > sin(factor *
> > > >> > beta))
> > > >> > 144 end if
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Fri, Mar 18, 2022 at 12:21 PM Scott Le Grand <
> > > varelse2005.gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Finally, the fix is surprisingly atomic modulo working the fixed
> > > point
> > > >> > > math into cellOfffset.
> > > >> > >
> > > >> > > On Fri, Mar 18, 2022 at 12:20 PM Scott Le Grand <
> > > >> varelse2005.gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> Also the fix is in(tm), but I need to address gpu_update_box
> and
> > I
> > > >> need
> > > >> > >> to know if ucell matrices are still upper diagonal because if
> not
> > > >> that
> > > >> > will
> > > >> > >> require some further work as I had to update the cell offsets
> to
> > be
> > > >> > >> precalculated to make fixed point consistent and properly
> bounded
> > > so
> > > >> > there
> > > >> > >> can't ever be an overflow.
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Fri, Mar 18, 2022 at 12:17 PM Scott Le Grand <
> > > >> varelse2005.gmail.com>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>> First up, thanks for replying! Appreciated!
> > > >> > >>>
> > > >> > >>> But... For triclinic cells, I never saw a ucell matrix full
> > > >> occupied.
> > > >> > >>> u(2,1), u(3,1), and u(3,2) are always zero, yes?
> > > >> > >>>
> > > >> > >>> If so, then the column space defines a, b, and c by the
> > respective
> > > >> > >>> column lengths that *are* the distances between those
> separating
> > > >> > planes.
> > > >> > >>> Check it out for yourself. Can you point me to a case that
> > > violates
> > > >> > this?
> > > >> > >>> If so, we have to fix the whole shebang, not just this
> routine.
> > > >> > Because I
> > > >> > >>> agree with you if we have fully occupied ucell matrices,
> > otherwise
> > > >> no.
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> On Fri, Mar 18, 2022 at 12:10 PM David Cerutti <
> > > dscerutti.gmail.com
> > > >> >
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>>> What's going on here is that not all unit cells are
> > > >> orthorhombic--such
> > > >> > >>>> would imply that the box angles alpha, beta, and gamma are
> > always
> > > >> 90
> > > >> > >>>> degrees. The general unit cell is triclinic, where alpha,
> > beta,
> > > >> and
> > > >> > >>>> gamma
> > > >> > >>>> can take on any values (with some degree of excluded
> territory,
> > > as
> > > >> > >>>> certain
> > > >> > >>>> combinations would fail to make a shape that can exist). The
> > > code
> > > >> you
> > > >> > >>>> are
> > > >> > >>>> seeing are seeing was created in response to another aspect
> of
> > > the
> > > >> > >>>> original
> > > >> > >>>> pmemd.cuda pair list that, if I recall correctly, was missing
> > > >> > >>>> interactions
> > > >> > >>>> from time to time due to a miscommunication between the way
> the
> > > >> pair
> > > >> > >>>> list
> > > >> > >>>> was generated versus the way the box had changed. I don't
> > recall
> > > >> if
> > > >> > it
> > > >> > >>>> was
> > > >> > >>>> creating energy drift or causing an outright crash. It was
> > also
> > > >> > around
> > > >> > >>>> the
> > > >> > >>>> time I added the -AllowSmallBox flag: note that the code will
> > not
> > > >> > allow
> > > >> > >>>> users to run simulations that are less than three pair list
> > > >> > subdivisions
> > > >> > >>>> along any dimension without an explicit instruction to do so,
> > as
> > > I
> > > >> > >>>> wasn't
> > > >> > >>>> sure that the way the particles get re-imaged at the
> beginning
> > of
> > > >> each
> > > >> > >>>> time
> > > >> > >>>> was safe if there are only two subdivisions.
> > > >> > >>>>
> > > >> > >>>> But the HessianNorms function is calculating the actual
> > thickness
> > > >> of
> > > >> > >>>> each
> > > >> > >>>> subdomain cell, as defined by the distance separating the
> > planes
> > > of
> > > >> > >>>> opposing faces. This is integral to the reason that, unless
> > you
> > > do
> > > >> > some
> > > >> > >>>> sophisticated stuff like the honeycomb idea I had, a
> truncated
> > > >> > >>>> octahedron
> > > >> > >>>> is not as efficient as the 22% reduction in overall particle
> > > count
> > > >> > would
> > > >> > >>>> imply: the total volume of each cell needs to expand as it
> > tilts,
> > > >> and
> > > >> > >>>> this
> > > >> > >>>> implies more tiles to cover the same volume. It's still
> > > >> advantageous
> > > >> > to
> > > >> > >>>> use a truncated octahedron, as the Hilbert Space curve does
> > > pretty
> > > >> > well
> > > >> > >>>> to
> > > >> > >>>> localize what is needed and of course the cutoff shell of any
> > > given
> > > >> > >>>> particle remains spherical.
> > > >> > >>>>
> > > >> > >>>> What's the ETA on the energy drift fix? Is this an atomic
> > > change,
> > > >> or
> > > >> > >>>> does
> > > >> > >>>> it also come with other refactoring / rollback of various
> > > precision
> > > >> > >>>> trims?
> > > >> > >>>>
> > > >> > >>>> Dave
> > > >> > >>>>
> > > >> > >>>>
> > > >> > >>>> On Fri, Mar 18, 2022 at 2:14 PM Scott Le Grand <
> > > >> varelse2005.gmail.com
> > > >> > >
> > > >> > >>>> wrote:
> > > >> > >>>>
> > > >> > >>>> > Let's look at this, here's the new code:
> > > >> > >>>> > HessianNorms(invU, cdepth);
> > > >> > >>>> > double skin = cdepth[0]/gpu->sim.xcells - gpu->sim.cut;
> > > >> > >>>> > double yskin = cdepth[1]/gpu->sim.ycells - gpu->sim.cut;
> > > >> > >>>> > double zskin = cdepth[2]/gpu->sim.zcells - gpu->sim.cut;
> > > >> > >>>> >
> > > >> > >>>> > The array cdepth is filed with the new a, b, and c by means
> > of
> > > an
> > > >> > >>>> external
> > > >> > >>>> > call. But as far as I know, all unit cells are
> orthorhombic,
> > so
> > > >> > we've
> > > >> > >>>> > always been able to use:
> > > >> > >>>> > // Recalculate a, b, c
> > > >> > >>>> > double a
> =
> > > >> > >>>> > pNTPData->ucell[0];
> > > >> > >>>> > double b
> =
> > > >> > >>>> > sqrt(pNTPData->ucell[1] * pNTPData->ucell[1] +
> > > >> > >>>> >
> > > >> > >>>> > pNTPData->ucell[4] * pNTPData->ucell[4]);
> > > >> > >>>> > double c
> =
> > > >> > >>>> > sqrt(pNTPData->ucell[2] * pNTPData->ucell[2] +
> > > >> > >>>> >
> > > >> > >>>> > pNTPData->ucell[5] * pNTPData->ucell[5] +
> > > >> > >>>> >
> > > >> > >>>> > pNTPData->ucell[8] * pNTPData->ucell[8]);
> > > >> > >>>> >
> > > >> > >>>> > because of the constraints that imposes on the column space
> > of
> > > >> the
> > > >> > >>>> unit
> > > >> > >>>> > cell matrix.
> > > >> > >>>> >
> > > >> > >>>> > So either the first code clause is overkill and repeated
> work
> > > or
> > > >> we
> > > >> > >>>> have to
> > > >> > >>>> > revisit the previous orthorhombic assumption. Which is it?
> > > >> > >>>> >
> > > >> > >>>> > But even then, why would one change the unit cell matrix
> > > without
> > > >> > >>>> updating
> > > >> > >>>> > the a, b, and c derived from it? And yes, I'm impatient w/r
> > to
> > > >> > fixing
> > > >> > >>>> bugs.
> > > >> > >>>> > Guilty.
> > > >> > >>>> >
> > > >> > >>>> >
> > > >> > >>>> > On Fri, Mar 18, 2022 at 10:42 AM Scott Le Grand <
> > > >> > >>>> varelse2005.gmail.com>
> > > >> > >>>> > wrote:
> > > >> > >>>> >
> > > >> > >>>> > > If the unit cell changes, wouldn't a, b, and c change as
> > well
> > > >> > since
> > > >> > >>>> they
> > > >> > >>>> > > are derived from said unit cell?
> > > >> > >>>> > >
> > > >> > >>>> > > As for the speed of response. If it's important to
> anyone,
> > > >> they'll
> > > >> > >>>> > > respond. If not, well they can make their principled
> stand
> > > >> against
> > > >> > >>>> me and
> > > >> > >>>> > > my impatience later. Otherwise, I acknowledge your point
> > but
> > > >> also
> > > >> > my
> > > >> > >>>> > > okeydokey here, nobody is debugging this thing. It's an
> > easy
> > > >> fix.
> > > >> > >>>> And I
> > > >> > >>>> > > need to do that fix to make it exploit the fixed point
> > local
> > > >> > >>>> coordinate
> > > >> > >>>> > > system, but I could leave the seemingly sloppy incorrect
> > > >> behavior
> > > >> > >>>> here.
> > > >> > >>>> > > Driveby contributions to PMEMD like this however are just
> > not
> > > >> good
> > > >> > >>>> IMO
> > > >> > >>>> > and
> > > >> > >>>> > > why I threw my hands up in frustration after trying to
> > > refactor
> > > >> > for
> > > >> > >>>> > > performance in 2020. It's a Sisyphusian effort and the
> > > tsunami
> > > >> of
> > > >> > >>>> > > unattended warning messages and dead code in the thing
> > alone
> > > is
> > > >> > >>>> enough to
> > > >> > >>>> > > make one run screaming into the night IMO. It was a
> single
> > > >> > >>>> unitialized
> > > >> > >>>> > > variable that broke PMEMD for Silicon Therapeutics in
> 2020.
> > > But
> > > >> > >>>> also, why
> > > >> > >>>> > > is there is so much dead code?
> > > >> > >>>> > >
> > > >> > >>>> > > The author of this code added a new and completely
> > > superfluous
> > > >> > >>>> method of
> > > >> > >>>> > > recalculating a, b,and c without exploiting the
> properties
> > of
> > > >> the
> > > >> > >>>> > > orthorhombic coordinate system to do it more simply as it
> > is
> > > >> done
> > > >> > in
> > > >> > >>>> > > gpu_pressure_scale, which itself does the same thing
> > > >> internally as
> > > >> > >>>> > > gpu_update_box if there's an MC barostat. And I hope you
> > can
> > > >> > >>>> forgive me
> > > >> > >>>> > for
> > > >> > >>>> > > being annoyed at the needless repetition of functionality
> > > here
> > > >> w/o
> > > >> > >>>> any
> > > >> > >>>> > > notes as to why this was necessary.
> > > >> > >>>> > >
> > > >> > >>>> > > On Fri, Mar 18, 2022 at 10:22 AM David A Case <
> > > >> > >>>> david.case.rutgers.edu>
> > > >> > >>>> > > wrote:
> > > >> > >>>> > >
> > > >> > >>>> > >> On Fri, Mar 18, 2022, Scott Le Grand wrote:
> > > >> > >>>> > >>
> > > >> > >>>> > >> >Okeydokey, nobody cares. Therefore it's a bug and I'm
> > going
> > > >> to
> > > >> > >>>> fix it.
> > > >> > >>>> > >> Test
> > > >> > >>>> > >> >will be updated to reflect this. What SM revision
> should
> > I
> > > >> use
> > > >> > >>>> for the
> > > >> > >>>> > >> test
> > > >> > >>>> > >> >reference files?
> > > >> > >>>> > >>
> > > >> > >>>> > >> ??!? How fast do you expect people to respond? I
> looked
> > at
> > > >> this
> > > >> > >>>> and
> > > >> > >>>> > >> didn't
> > > >> > >>>> > >> see any obvious error -- the two routines are called at
> > > >> different
> > > >> > >>>> > places:
> > > >> > >>>> > >> gpu_pressure_scale only when ntp>0 and barostat=1,
> whereas
> > > >> > >>>> > gpu_update_box
> > > >> > >>>> > >> is only called when re-weighting trajectories.
> > > >> > >>>> > >>
> > > >> > >>>> > >> But I don't know the code here. Clearly, you must be
> > > >> convinced
> > > >> > >>>> that
> > > >> > >>>> > a,b,c
> > > >> > >>>> > >> need to be updated when reweight is > 0. I'll try to
> look
> > > at
> > > >> > what
> > > >> > >>>> > >> behavior
> > > >> > >>>> > >> changes when you make the change. But I'm travelling to
> > the
> > > >> ACS
> > > >> > >>>> meeting
> > > >> > >>>> > >> tomorrow, so you may disappointed if you require
> immediate
> > > >> > >>>> feedback.
> > > >> > >>>> > >>
> > > >> > >>>> > >> ....dac
> > > >> > >>>> > >>
> > > >> > >>>> > >>
> > > >> > >>>> > >> _______________________________________________
> > > >> > >>>> > >> AMBER-Developers mailing list
> > > >> > >>>> > >> AMBER-Developers.ambermd.org
> > > >> > >>>> > >>
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >> > >>>> > >>
> > > >> > >>>> > >
> > > >> > >>>> > _______________________________________________
> > > >> > >>>> > AMBER-Developers mailing list
> > > >> > >>>> > AMBER-Developers.ambermd.org
> > > >> > >>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >> > >>>> >
> > > >> > >>>> _______________________________________________
> > > >> > >>>> AMBER-Developers mailing list
> > > >> > >>>> AMBER-Developers.ambermd.org
> > > >> > >>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >> > >>>>
> > > >> > >>>
> > > >> > _______________________________________________
> > > >> > AMBER-Developers mailing list
> > > >> > AMBER-Developers.ambermd.org
> > > >> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >> >
> > > >> _______________________________________________
> > > >> AMBER-Developers mailing list
> > > >> AMBER-Developers.ambermd.org
> > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >>
> > > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Mar 19 2022 - 10:30:02 PDT
Custom Search