Re: [AMBER-Developers] Faster NetFrc from David Cerutti on 2020-06-30 (Amber Developers Archive Jun 2020)

From: David Cerutti <dscerutti.gmail.com>
Date: Tue, 30 Jun 2020 12:50:09 -0400

You can count on extra points always having a mass of zero. In fact, they
always have type "EP" by the code's hard-wired inanity (that was an issue I
had to correct before mdgx could print conforming topologies).

There are other problems with the 100M atom hero runs, namely what tleap
can cobble together and a hard limit of 16M atoms based on some other
integer bit encoding that is in the GPU code, but otherwise it would be
great to have a 4x speedup for 8 GPUs in a medium-large system of 250K-1M
atoms.

Dave

On Tue, Jun 30, 2020 at 12:42 PM Scott Le Grand <varelse2005.gmail.com>
wrote:

> OK, I spent some more time thinking this over before coding it up. Your
> proposal works for extra points. But do extra points (always) have a mass
> of zero? If so, I could just check their mass and not subtract netforce
> from them. I am thinking an extra read/write pass on memory is more
> expensive than a predicate based on the already loaded mass in the update
> routine. Correct me if I am wrong?
>
> W/r to Softcore TI, I set up two accumulators, one for each TI region. If
> TI is active, if I read the region, I can normalize off the respective
> region. I would just need to pre-calculate the number of atoms in each
> region and add a local index for this ditty:
> PMEAccumulator nx = NFX / cSim.atoms + (pos < (NFX % cSim.atoms));
>
> PMEAccumulator ny = NFY / cSim.atoms + (pos < (NFY % cSim.atoms));
> PMEAccumulator nz = NFZ / cSim.atoms + (pos < (NFZ % cSim.atoms));
>
> This makes sure there is *perfect* conservation in fixed point before
> converting to double precision for the update.
>
> Eliminating one read and one read/write pass is already 1-7% faster for
> vanilla systems. I am working towards getting rid of as much serialized GPU
> stuff as I can so as to speed up parallel runs. I am hoping to hit ~4x on 8
> GPUs connected to NVSWITCH by end of Summer and I think that's attainable
> even if we never get a faster dsitributed FFT. The latter however, if we
> had it, would unlock a fully distributed state that would allow AMBER to do
> those 100M+ atom hero runs efficiently.
>
> Scott
>
>
>
> On Fri, May 22, 2020 at 2:16 PM <taisung.gmail.com> wrote:
>
> > Actually, I don't think you need to deal w/ the extra points at all if
> you
> > do netfrc right after PME gradsum. Just do the total accumulate and get
> > rid
> > of the resulting net forces for all atoms (including extra points). The
> > forces on extra points will be transferred to nearby heavy atoms (hence
> > become zero) in a later stage.
> >
> > The results could be tiny different though. But theoretically I believe
> it
> > is OK.
> >
> > Taisung
> >
> > -----Original Message-----
> > From: Scott Le Grand [mailto:varelse2005.gmail.com]
> > Sent: Friday, May 22, 2020 4:31 PM
> > To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> > Subject: Re: [AMBER-Developers] Faster NetFrc
> >
> > So if I accumulate them from the extra points in gradsum, but then only
> > apply to atoms that are not extra points, we are good?
> >
> > On Fri, May 22, 2020 at 1:29 PM David Cerutti <dscerutti.gmail.com>
> wrote:
> >
> > > Our emails may have crossed in writing, but it appears that in both
> > > CPU and GPU codes the netfrc is being subtracted after extra points
> > > have had their forces transmitted to their massive frame atoms. The
> > > solutions to dealing with the pollution that netfrc correction causes
> > > with extra points differ in each code.
> > >
> > > Dave
> > >
> > >
> > > On Fri, May 22, 2020 at 4:27 PM Scott Le Grand <varelse2005.gmail.com>
> > > wrote:
> > >
> > > > Cool, and if it's an extra point should I or should I not subtract
> > > > the netfrc? Dumb questions asked upfront to save time later?
> > > >
> > > > On Fri, May 22, 2020 at 1:20 PM <cancersimulation.gmail.com> wrote:
> > > >
> > > > > It is static. Usually, it should be the same as "numextra" here
> > > > >
> > > > > use prmtop_dat_mod, only : numextra
> > > > >
> > > > > I believe Dave Cerutti has implemented some extra-point stuff and
> > > > > I
> > > don't
> > > > > know if the above statement still holds with his stuff.
> > > > >
> > > > > Taisung
> > > > >
> > > > > -----Original Message-----
> > > > > From: Scott Le Grand [mailto:varelse2005.gmail.com]
> > > > > Sent: Friday, May 22, 2020 3:02 PM
> > > > > To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> > > > > Subject: Re: [AMBER-Developers] Faster NetFrc
> > > > >
> > > > > So also...
> > > > >
> > > > > I would assume the value of "ignored" is static. How do I
> > > > > calculate it
> > > a
> > > > > priori? And it looks like I don't apply it to the extra points.
> > > > > Feel
> > > free
> > > > > to
> > > > > describe off-list, but I want to drill down deep and get this
> > > > > right on
> > > > the
> > > > > first swing...
> > > > >
> > > > > On Thu, May 21, 2020 at 6:08 PM Scott Le Grand
> > > > > <varelse2005.gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Excellent, that makes this much more straightforward. Should
> > > > > > have something by early next week.
> > > > > >
> > > > > > On Thu, May 21, 2020 at 6:04 PM <taisung.gmail.com> wrote:
> > > > > >
> > > > > >> The force thresholds are for those "dummies" (not the
> > > > > >> alchemical dummies but something like lone-pair points of water
> > > > > >> models). The forces of those atoms are "transferred" to other
> > > > > >> real atoms at the final force collection stage--and hence need
> > > > > >> to be kept to zero during the netfrc stage. Of course, you may
> > > > > >> find better ways to do things. For example, as Scott
> > > > > >> mentioned, the only non-conserved force part is PME reciprocal
> > > > > >> part. If the netfrc is done in the PME reciprocal part, there
> is
> > no need to have such force thresholds.
> > > > > >>
> > > > > >> Taisung
> > > > > >>
> > > > > >> -----Original Message-----
> > > > > >> From: David Cerutti [mailto:dscerutti.gmail.com]
> > > > > >> Sent: Thursday, May 21, 2020 8:37 PM
> > > > > >> To: AMBER Developers Mailing List
> > > > > >> <amber-developers.ambermd.org>
> > > > > >> Subject: Re: [AMBER-Developers] Faster NetFrc
> > > > > >>
> > > > > >> As implied, the GTI code is the revision that introduced this.
> > > > > >> Taisung can comment more on his logic, but the presence of this
> > > > > >> "small" term reminds me of something he's got in the non-bonded
> > > inner
> > > > > >> loop as well. I'm not sure we ever determined why these
> > > conditionals
> > > > > >> were needed; I think the one in the non-bonded loop should just
> > > > > >> go away after some other revisions I made, but I'll wait for
> > > > > >> more
> > > input.
> > > > > >>
> > > > > >> Dave
> > > > > >>
> > > > > >>
> > > > > >> On Thu, May 21, 2020 at 8:29 PM Scott Le Grand
> > > > > >> <varelse2005.gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hey guys,
> > > > > >> > Back as an AMBER developer and one of the first things I'd
> > > > > >> > like to do is optimize netfrc.
> > > > > >> >
> > > > > >> > Fixed point conservative forces have no net force. They
> > > > > >> > automagically cancel out 100%. So we can ignore everything
> > > > > >> > except non-conserved
> > > > > >> forces.
> > > > > >> >
> > > > > >> > The only non-conserved force I know of is the Ewald Gradient
> > Sum.
> > > > > >> > So if I add up the net force there, and handle it upon either
> > > force
> > > > > >> > reduction or update, I can eliminate two passes on memory and
> > > > > >> > the two kernels currently dedicated to adding and then
> > > > > >> > subtracting
> > > them.
> > > > > >> >
> > > > > >> > But... Are there any other non-conserved forces?
> > > > > >> >
> > > > > >> > And... It looks like the FORTRAN and the CUDA code do
> > > > > >> > different
> > > > > things.
> > > > > >> > FORTRAN simply averages the forces and subtracts. But CUDA
> > > > > >> > only averages over atoms with force thresholds above a
> > > > > >> > predefined small amount. What's up with that?
> > > > > >> >
> > > > > >> > pme_ene.F90:
> > > > > >> > if (netfrc .gt. 0 .and. onstep) then
> > > > > >> >
> > > > > >> > if (ti_mode .eq. 0) then
> > > > > >> > do i = 1, atm_cnt
> > > > > >> > net_frcs(:) = net_frcs(:) + frc(:, i)
> > > > > >> > end do
> > > > > >> >
> > > > > >> > ! Now do the correction:
> > > > > >> >
> > > > > >> > net_frcs(:) = net_frcs(:) / dble(atm_cnt - numextra)
> > > > > >> >
> > > > > >> > do i = 1, atm_cnt
> > > > > >> > frc(:, i) = frc(:, i) - net_frcs(:)
> > > > > >> > end do
> > > > > >> > else
> > > > > >> > do i = 1, atm_cnt
> > > > > >> > ti_net_frcs(1, :) = ti_net_frcs(1, :) +
> > > > > >> > ti_nb_frc(1, :,
> > > i)
> > > > > >> > ti_net_frcs(2, :) = ti_net_frcs(2, :) +
> > > > > >> > ti_nb_frc(2, :,
> > > i)
> > > > > >> > end do
> > > > > >> >
> > > > > >> > ti_net_frcs(1,:) =
> > > > > >> > ti_net_frcs(1,:)/dble(ti_atm_cnt(1)-ti_numextra_pts(1))
> > > > > >> > ti_net_frcs(2,:) =
> > > > > >> > ti_net_frcs(2,:)/dble(ti_atm_cnt(2)-ti_numextra_pts(2))
> > > > > >> > net_frcs(:) = ti_net_frcs(1,:) + ti_net_frcs(2,:)
> > > > > >> >
> > > > > >> > do i = 1, atm_cnt
> > > > > >> > ! This matches how sander removes netfrcs in TI runs
> > > > > >> > if (ti_lst(1,i) .ne. 0) then
> > > > > >> > frc(:, i) = frc(:, i) - ti_net_frcs(1,:)
> > > > > >> > else if (ti_lst(2,i) .ne. 0) then
> > > > > >> > frc(:, i) = frc(:, i) - ti_net_frcs(2,:)
> > > > > >> > else
> > > > > >> > frc(:, i) = frc(:, i) - net_frcs(:)
> > > > > >> > end if
> > > > > >> > end do
> > > > > >> > end if
> > > > > >> > ! Any extra points must have their 0.d0 forces reset...
> > > > > >> >
> > > > > >> > if (numextra .gt. 0 .and. frameon .ne. 0) &
> > > > > >> > call zero_extra_pnts_vec(frc, ep_frames,
> > > > > >> > gbl_frame_cnt)
> > > > > >> >
> > > > > >> > end if
> > > > > >> >
> > > > > >> >
> > > > > >> > GTI path:
> > > > > >> > while (pos < cSim.atoms) {
> > > > > >> > PMEFloat fx = converter(pX[pos], ONEOVERFORCESCALE);
> > > > > >> > PMEFloat fy = converter(pY[pos], ONEOVERFORCESCALE);
> > > > > >> > PMEFloat fz = converter(pZ[pos], ONEOVERFORCESCALE);
> > > > > >> > if (abs(fx) > small || abs(fy) > small || abs(fz) >
> small) {
> > > > > >> > pX[pos] -= nfX;
> > > > > >> > pY[pos] -= nfY;
> > > > > >> > pZ[pos] -= nfZ;
> > > > > >> > }
> > > > > >> > pos += increment;
> > > > > >> > }
> > > > > >> >
> > > > > >> > Scott
> > > > > >> > _______________________________________________
> > > > > >> > AMBER-Developers mailing list AMBER-Developers.ambermd.org
> > > > > >> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > >> >
> > > > > >> _______________________________________________
> > > > > >> AMBER-Developers mailing list
> > > > > >> AMBER-Developers.ambermd.org
> > > > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > >>
> > > > > >>
> > > > > >> _______________________________________________
> > > > > >> AMBER-Developers mailing list
> > > > > >> AMBER-Developers.ambermd.org
> > > > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > >>
> > > > > >
> > > > > _______________________________________________
> > > > > AMBER-Developers mailing list
> > > > > AMBER-Developers.ambermd.org
> > > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > AMBER-Developers mailing list
> > > > > AMBER-Developers.ambermd.org
> > > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > >
> > > > _______________________________________________
> > > > AMBER-Developers mailing list
> > > > AMBER-Developers.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Jun 30 2020 - 10:00:05 PDT