Re: [AMBER-Developers] Faster NetFrc from Scott Le Grand on 2020-06-30 (Amber Developers Archive Jun 2020)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Tue, 30 Jun 2020 09:42:20 -0700

OK, I spent some more time thinking this over before coding it up. Your
proposal works for extra points. But do extra points (always) have a mass
of zero? If so, I could just check their mass and not subtract netforce
from them. I am thinking an extra read/write pass on memory is more
expensive than a predicate based on the already loaded mass in the update
routine. Correct me if I am wrong?

W/r to Softcore TI, I set up two accumulators, one for each TI region. If
TI is active, if I read the region, I can normalize off the respective
region. I would just need to pre-calculate the number of atoms in each
region and add a local index for this ditty:
    PMEAccumulator nx = NFX / cSim.atoms + (pos < (NFX % cSim.atoms));

    PMEAccumulator ny = NFY / cSim.atoms + (pos < (NFY % cSim.atoms));
    PMEAccumulator nz = NFZ / cSim.atoms + (pos < (NFZ % cSim.atoms));

This makes sure there is *perfect* conservation in fixed point before
converting to double precision for the update.

Eliminating one read and one read/write pass is already 1-7% faster for
vanilla systems. I am working towards getting rid of as much serialized GPU
stuff as I can so as to speed up parallel runs. I am hoping to hit ~4x on 8
GPUs connected to NVSWITCH by end of Summer and I think that's attainable
even if we never get a faster dsitributed FFT. The latter however, if we
had it, would unlock a fully distributed state that would allow AMBER to do
those 100M+ atom hero runs efficiently.

Scott

On Fri, May 22, 2020 at 2:16 PM <taisung.gmail.com> wrote:

> Actually, I don't think you need to deal w/ the extra points at all if you
> do netfrc right after PME gradsum. Just do the total accumulate and get
> rid
> of the resulting net forces for all atoms (including extra points). The
> forces on extra points will be transferred to nearby heavy atoms (hence
> become zero) in a later stage.
>
> The results could be tiny different though. But theoretically I believe it
> is OK.
>
> Taisung
>
> -----Original Message-----
> From: Scott Le Grand [mailto:varelse2005.gmail.com]
> Sent: Friday, May 22, 2020 4:31 PM
> To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> Subject: Re: [AMBER-Developers] Faster NetFrc
>
> So if I accumulate them from the extra points in gradsum, but then only
> apply to atoms that are not extra points, we are good?
>
> On Fri, May 22, 2020 at 1:29 PM David Cerutti <dscerutti.gmail.com> wrote:
>
> > Our emails may have crossed in writing, but it appears that in both
> > CPU and GPU codes the netfrc is being subtracted after extra points
> > have had their forces transmitted to their massive frame atoms. The
> > solutions to dealing with the pollution that netfrc correction causes
> > with extra points differ in each code.
> >
> > Dave
> >
> >
> > On Fri, May 22, 2020 at 4:27 PM Scott Le Grand <varelse2005.gmail.com>
> > wrote:
> >
> > > Cool, and if it's an extra point should I or should I not subtract
> > > the netfrc? Dumb questions asked upfront to save time later?
> > >
> > > On Fri, May 22, 2020 at 1:20 PM <cancersimulation.gmail.com> wrote:
> > >
> > > > It is static. Usually, it should be the same as "numextra" here
> > > >
> > > > use prmtop_dat_mod, only : numextra
> > > >
> > > > I believe Dave Cerutti has implemented some extra-point stuff and
> > > > I
> > don't
> > > > know if the above statement still holds with his stuff.
> > > >
> > > > Taisung
> > > >
> > > > -----Original Message-----
> > > > From: Scott Le Grand [mailto:varelse2005.gmail.com]
> > > > Sent: Friday, May 22, 2020 3:02 PM
> > > > To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> > > > Subject: Re: [AMBER-Developers] Faster NetFrc
> > > >
> > > > So also...
> > > >
> > > > I would assume the value of "ignored" is static. How do I
> > > > calculate it
> > a
> > > > priori? And it looks like I don't apply it to the extra points.
> > > > Feel
> > free
> > > > to
> > > > describe off-list, but I want to drill down deep and get this
> > > > right on
> > > the
> > > > first swing...
> > > >
> > > > On Thu, May 21, 2020 at 6:08 PM Scott Le Grand
> > > > <varelse2005.gmail.com>
> > > > wrote:
> > > >
> > > > > Excellent, that makes this much more straightforward. Should
> > > > > have something by early next week.
> > > > >
> > > > > On Thu, May 21, 2020 at 6:04 PM <taisung.gmail.com> wrote:
> > > > >
> > > > >> The force thresholds are for those "dummies" (not the
> > > > >> alchemical dummies but something like lone-pair points of water
> > > > >> models). The forces of those atoms are "transferred" to other
> > > > >> real atoms at the final force collection stage--and hence need
> > > > >> to be kept to zero during the netfrc stage. Of course, you may
> > > > >> find better ways to do things. For example, as Scott
> > > > >> mentioned, the only non-conserved force part is PME reciprocal
> > > > >> part. If the netfrc is done in the PME reciprocal part, there is
> no need to have such force thresholds.
> > > > >>
> > > > >> Taisung
> > > > >>
> > > > >> -----Original Message-----
> > > > >> From: David Cerutti [mailto:dscerutti.gmail.com]
> > > > >> Sent: Thursday, May 21, 2020 8:37 PM
> > > > >> To: AMBER Developers Mailing List
> > > > >> <amber-developers.ambermd.org>
> > > > >> Subject: Re: [AMBER-Developers] Faster NetFrc
> > > > >>
> > > > >> As implied, the GTI code is the revision that introduced this.
> > > > >> Taisung can comment more on his logic, but the presence of this
> > > > >> "small" term reminds me of something he's got in the non-bonded
> > inner
> > > > >> loop as well. I'm not sure we ever determined why these
> > conditionals
> > > > >> were needed; I think the one in the non-bonded loop should just
> > > > >> go away after some other revisions I made, but I'll wait for
> > > > >> more
> > input.
> > > > >>
> > > > >> Dave
> > > > >>
> > > > >>
> > > > >> On Thu, May 21, 2020 at 8:29 PM Scott Le Grand
> > > > >> <varelse2005.gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hey guys,
> > > > >> > Back as an AMBER developer and one of the first things I'd
> > > > >> > like to do is optimize netfrc.
> > > > >> >
> > > > >> > Fixed point conservative forces have no net force. They
> > > > >> > automagically cancel out 100%. So we can ignore everything
> > > > >> > except non-conserved
> > > > >> forces.
> > > > >> >
> > > > >> > The only non-conserved force I know of is the Ewald Gradient
> Sum.
> > > > >> > So if I add up the net force there, and handle it upon either
> > force
> > > > >> > reduction or update, I can eliminate two passes on memory and
> > > > >> > the two kernels currently dedicated to adding and then
> > > > >> > subtracting
> > them.
> > > > >> >
> > > > >> > But... Are there any other non-conserved forces?
> > > > >> >
> > > > >> > And... It looks like the FORTRAN and the CUDA code do
> > > > >> > different
> > > > things.
> > > > >> > FORTRAN simply averages the forces and subtracts. But CUDA
> > > > >> > only averages over atoms with force thresholds above a
> > > > >> > predefined small amount. What's up with that?
> > > > >> >
> > > > >> > pme_ene.F90:
> > > > >> > if (netfrc .gt. 0 .and. onstep) then
> > > > >> >
> > > > >> > if (ti_mode .eq. 0) then
> > > > >> > do i = 1, atm_cnt
> > > > >> > net_frcs(:) = net_frcs(:) + frc(:, i)
> > > > >> > end do
> > > > >> >
> > > > >> > ! Now do the correction:
> > > > >> >
> > > > >> > net_frcs(:) = net_frcs(:) / dble(atm_cnt - numextra)
> > > > >> >
> > > > >> > do i = 1, atm_cnt
> > > > >> > frc(:, i) = frc(:, i) - net_frcs(:)
> > > > >> > end do
> > > > >> > else
> > > > >> > do i = 1, atm_cnt
> > > > >> > ti_net_frcs(1, :) = ti_net_frcs(1, :) +
> > > > >> > ti_nb_frc(1, :,
> > i)
> > > > >> > ti_net_frcs(2, :) = ti_net_frcs(2, :) +
> > > > >> > ti_nb_frc(2, :,
> > i)
> > > > >> > end do
> > > > >> >
> > > > >> > ti_net_frcs(1,:) =
> > > > >> > ti_net_frcs(1,:)/dble(ti_atm_cnt(1)-ti_numextra_pts(1))
> > > > >> > ti_net_frcs(2,:) =
> > > > >> > ti_net_frcs(2,:)/dble(ti_atm_cnt(2)-ti_numextra_pts(2))
> > > > >> > net_frcs(:) = ti_net_frcs(1,:) + ti_net_frcs(2,:)
> > > > >> >
> > > > >> > do i = 1, atm_cnt
> > > > >> > ! This matches how sander removes netfrcs in TI runs
> > > > >> > if (ti_lst(1,i) .ne. 0) then
> > > > >> > frc(:, i) = frc(:, i) - ti_net_frcs(1,:)
> > > > >> > else if (ti_lst(2,i) .ne. 0) then
> > > > >> > frc(:, i) = frc(:, i) - ti_net_frcs(2,:)
> > > > >> > else
> > > > >> > frc(:, i) = frc(:, i) - net_frcs(:)
> > > > >> > end if
> > > > >> > end do
> > > > >> > end if
> > > > >> > ! Any extra points must have their 0.d0 forces reset...
> > > > >> >
> > > > >> > if (numextra .gt. 0 .and. frameon .ne. 0) &
> > > > >> > call zero_extra_pnts_vec(frc, ep_frames,
> > > > >> > gbl_frame_cnt)
> > > > >> >
> > > > >> > end if
> > > > >> >
> > > > >> >
> > > > >> > GTI path:
> > > > >> > while (pos < cSim.atoms) {
> > > > >> > PMEFloat fx = converter(pX[pos], ONEOVERFORCESCALE);
> > > > >> > PMEFloat fy = converter(pY[pos], ONEOVERFORCESCALE);
> > > > >> > PMEFloat fz = converter(pZ[pos], ONEOVERFORCESCALE);
> > > > >> > if (abs(fx) > small || abs(fy) > small || abs(fz) > small) {
> > > > >> > pX[pos] -= nfX;
> > > > >> > pY[pos] -= nfY;
> > > > >> > pZ[pos] -= nfZ;
> > > > >> > }
> > > > >> > pos += increment;
> > > > >> > }
> > > > >> >
> > > > >> > Scott
> > > > >> > _______________________________________________
> > > > >> > AMBER-Developers mailing list AMBER-Developers.ambermd.org
> > > > >> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > >> >
> > > > >> _______________________________________________
> > > > >> AMBER-Developers mailing list
> > > > >> AMBER-Developers.ambermd.org
> > > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > >>
> > > > >>
> > > > >> _______________________________________________
> > > > >> AMBER-Developers mailing list
> > > > >> AMBER-Developers.ambermd.org
> > > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > >>
> > > > >
> > > > _______________________________________________
> > > > AMBER-Developers mailing list
> > > > AMBER-Developers.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >
> > > >
> > > > _______________________________________________
> > > > AMBER-Developers mailing list
> > > > AMBER-Developers.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Jun 30 2020 - 10:00:03 PDT