Re: [AMBER-Developers] Faster NetFrc from Scott Le Grand on 2020-06-30 (Amber Developers Archive Jun 2020)

From: Scott Le Grand <varelse2005.gmail.com>
Date: Tue, 30 Jun 2020 09:54:31 -0700

We can get past the 16M atom encoding pretty easily. Future proofing is
best done in the future where you know what broke IMO.

But yes, I think the 100K-10M atom range is the sweet spot. But if we can
throw the marketing sorts a bone here or there without significant
additional effort (and we're just talking throwing more bits at the spatial
hash I believe for large systems), why not? That part of the code scales
linearly anyway (yes it needs a little work to make that happen, but TLDR
bucket, chuck it, and sort it).

On Tue, Jun 30, 2020 at 9:50 AM David Cerutti <dscerutti.gmail.com> wrote:

> You can count on extra points always having a mass of zero. In fact, they
> always have type "EP" by the code's hard-wired inanity (that was an issue I
> had to correct before mdgx could print conforming topologies).
>
> There are other problems with the 100M atom hero runs, namely what tleap
> can cobble together and a hard limit of 16M atoms based on some other
> integer bit encoding that is in the GPU code, but otherwise it would be
> great to have a 4x speedup for 8 GPUs in a medium-large system of 250K-1M
> atoms.
>
> Dave
>
>
> On Tue, Jun 30, 2020 at 12:42 PM Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
> > OK, I spent some more time thinking this over before coding it up. Your
> > proposal works for extra points. But do extra points (always) have a mass
> > of zero? If so, I could just check their mass and not subtract netforce
> > from them. I am thinking an extra read/write pass on memory is more
> > expensive than a predicate based on the already loaded mass in the update
> > routine. Correct me if I am wrong?
> >
> > W/r to Softcore TI, I set up two accumulators, one for each TI region. If
> > TI is active, if I read the region, I can normalize off the respective
> > region. I would just need to pre-calculate the number of atoms in each
> > region and add a local index for this ditty:
> > PMEAccumulator nx = NFX / cSim.atoms + (pos < (NFX % cSim.atoms));
> >
> > PMEAccumulator ny = NFY / cSim.atoms + (pos < (NFY % cSim.atoms));
> > PMEAccumulator nz = NFZ / cSim.atoms + (pos < (NFZ % cSim.atoms));
> >
> > This makes sure there is *perfect* conservation in fixed point before
> > converting to double precision for the update.
> >
> > Eliminating one read and one read/write pass is already 1-7% faster for
> > vanilla systems. I am working towards getting rid of as much serialized
> GPU
> > stuff as I can so as to speed up parallel runs. I am hoping to hit ~4x
> on 8
> > GPUs connected to NVSWITCH by end of Summer and I think that's attainable
> > even if we never get a faster dsitributed FFT. The latter however, if we
> > had it, would unlock a fully distributed state that would allow AMBER to
> do
> > those 100M+ atom hero runs efficiently.
> >
> > Scott
> >
> >
> >
> > On Fri, May 22, 2020 at 2:16 PM <taisung.gmail.com> wrote:
> >
> > > Actually, I don't think you need to deal w/ the extra points at all if
> > you
> > > do netfrc right after PME gradsum. Just do the total accumulate and
> get
> > > rid
> > > of the resulting net forces for all atoms (including extra points).
> The
> > > forces on extra points will be transferred to nearby heavy atoms (hence
> > > become zero) in a later stage.
> > >
> > > The results could be tiny different though. But theoretically I
> believe
> > it
> > > is OK.
> > >
> > > Taisung
> > >
> > > -----Original Message-----
> > > From: Scott Le Grand [mailto:varelse2005.gmail.com]
> > > Sent: Friday, May 22, 2020 4:31 PM
> > > To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> > > Subject: Re: [AMBER-Developers] Faster NetFrc
> > >
> > > So if I accumulate them from the extra points in gradsum, but then only
> > > apply to atoms that are not extra points, we are good?
> > >
> > > On Fri, May 22, 2020 at 1:29 PM David Cerutti <dscerutti.gmail.com>
> > wrote:
> > >
> > > > Our emails may have crossed in writing, but it appears that in both
> > > > CPU and GPU codes the netfrc is being subtracted after extra points
> > > > have had their forces transmitted to their massive frame atoms. The
> > > > solutions to dealing with the pollution that netfrc correction causes
> > > > with extra points differ in each code.
> > > >
> > > > Dave
> > > >
> > > >
> > > > On Fri, May 22, 2020 at 4:27 PM Scott Le Grand <
> varelse2005.gmail.com>
> > > > wrote:
> > > >
> > > > > Cool, and if it's an extra point should I or should I not subtract
> > > > > the netfrc? Dumb questions asked upfront to save time later?
> > > > >
> > > > > On Fri, May 22, 2020 at 1:20 PM <cancersimulation.gmail.com>
> wrote:
> > > > >
> > > > > > It is static. Usually, it should be the same as "numextra" here
> > > > > >
> > > > > > use prmtop_dat_mod, only : numextra
> > > > > >
> > > > > > I believe Dave Cerutti has implemented some extra-point stuff and
> > > > > > I
> > > > don't
> > > > > > know if the above statement still holds with his stuff.
> > > > > >
> > > > > > Taisung
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Scott Le Grand [mailto:varelse2005.gmail.com]
> > > > > > Sent: Friday, May 22, 2020 3:02 PM
> > > > > > To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> > > > > > Subject: Re: [AMBER-Developers] Faster NetFrc
> > > > > >
> > > > > > So also...
> > > > > >
> > > > > > I would assume the value of "ignored" is static. How do I
> > > > > > calculate it
> > > > a
> > > > > > priori? And it looks like I don't apply it to the extra points.
> > > > > > Feel
> > > > free
> > > > > > to
> > > > > > describe off-list, but I want to drill down deep and get this
> > > > > > right on
> > > > > the
> > > > > > first swing...
> > > > > >
> > > > > > On Thu, May 21, 2020 at 6:08 PM Scott Le Grand
> > > > > > <varelse2005.gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Excellent, that makes this much more straightforward. Should
> > > > > > > have something by early next week.
> > > > > > >
> > > > > > > On Thu, May 21, 2020 at 6:04 PM <taisung.gmail.com> wrote:
> > > > > > >
> > > > > > >> The force thresholds are for those "dummies" (not the
> > > > > > >> alchemical dummies but something like lone-pair points of
> water
> > > > > > >> models). The forces of those atoms are "transferred" to other
> > > > > > >> real atoms at the final force collection stage--and hence need
> > > > > > >> to be kept to zero during the netfrc stage. Of course, you
> may
> > > > > > >> find better ways to do things. For example, as Scott
> > > > > > >> mentioned, the only non-conserved force part is PME reciprocal
> > > > > > >> part. If the netfrc is done in the PME reciprocal part, there
> > is
> > > no need to have such force thresholds.
> > > > > > >>
> > > > > > >> Taisung
> > > > > > >>
> > > > > > >> -----Original Message-----
> > > > > > >> From: David Cerutti [mailto:dscerutti.gmail.com]
> > > > > > >> Sent: Thursday, May 21, 2020 8:37 PM
> > > > > > >> To: AMBER Developers Mailing List
> > > > > > >> <amber-developers.ambermd.org>
> > > > > > >> Subject: Re: [AMBER-Developers] Faster NetFrc
> > > > > > >>
> > > > > > >> As implied, the GTI code is the revision that introduced this.
> > > > > > >> Taisung can comment more on his logic, but the presence of
> this
> > > > > > >> "small" term reminds me of something he's got in the
> non-bonded
> > > > inner
> > > > > > >> loop as well. I'm not sure we ever determined why these
> > > > conditionals
> > > > > > >> were needed; I think the one in the non-bonded loop should
> just
> > > > > > >> go away after some other revisions I made, but I'll wait for
> > > > > > >> more
> > > > input.
> > > > > > >>
> > > > > > >> Dave
> > > > > > >>
> > > > > > >>
> > > > > > >> On Thu, May 21, 2020 at 8:29 PM Scott Le Grand
> > > > > > >> <varelse2005.gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hey guys,
> > > > > > >> > Back as an AMBER developer and one of the first things I'd
> > > > > > >> > like to do is optimize netfrc.
> > > > > > >> >
> > > > > > >> > Fixed point conservative forces have no net force. They
> > > > > > >> > automagically cancel out 100%. So we can ignore everything
> > > > > > >> > except non-conserved
> > > > > > >> forces.
> > > > > > >> >
> > > > > > >> > The only non-conserved force I know of is the Ewald Gradient
> > > Sum.
> > > > > > >> > So if I add up the net force there, and handle it upon
> either
> > > > force
> > > > > > >> > reduction or update, I can eliminate two passes on memory
> and
> > > > > > >> > the two kernels currently dedicated to adding and then
> > > > > > >> > subtracting
> > > > them.
> > > > > > >> >
> > > > > > >> > But... Are there any other non-conserved forces?
> > > > > > >> >
> > > > > > >> > And... It looks like the FORTRAN and the CUDA code do
> > > > > > >> > different
> > > > > > things.
> > > > > > >> > FORTRAN simply averages the forces and subtracts. But CUDA
> > > > > > >> > only averages over atoms with force thresholds above a
> > > > > > >> > predefined small amount. What's up with that?
> > > > > > >> >
> > > > > > >> > pme_ene.F90:
> > > > > > >> > if (netfrc .gt. 0 .and. onstep) then
> > > > > > >> >
> > > > > > >> > if (ti_mode .eq. 0) then
> > > > > > >> > do i = 1, atm_cnt
> > > > > > >> > net_frcs(:) = net_frcs(:) + frc(:, i)
> > > > > > >> > end do
> > > > > > >> >
> > > > > > >> > ! Now do the correction:
> > > > > > >> >
> > > > > > >> > net_frcs(:) = net_frcs(:) / dble(atm_cnt - numextra)
> > > > > > >> >
> > > > > > >> > do i = 1, atm_cnt
> > > > > > >> > frc(:, i) = frc(:, i) - net_frcs(:)
> > > > > > >> > end do
> > > > > > >> > else
> > > > > > >> > do i = 1, atm_cnt
> > > > > > >> > ti_net_frcs(1, :) = ti_net_frcs(1, :) +
> > > > > > >> > ti_nb_frc(1, :,
> > > > i)
> > > > > > >> > ti_net_frcs(2, :) = ti_net_frcs(2, :) +
> > > > > > >> > ti_nb_frc(2, :,
> > > > i)
> > > > > > >> > end do
> > > > > > >> >
> > > > > > >> > ti_net_frcs(1,:) =
> > > > > > >> > ti_net_frcs(1,:)/dble(ti_atm_cnt(1)-ti_numextra_pts(1))
> > > > > > >> > ti_net_frcs(2,:) =
> > > > > > >> > ti_net_frcs(2,:)/dble(ti_atm_cnt(2)-ti_numextra_pts(2))
> > > > > > >> > net_frcs(:) = ti_net_frcs(1,:) + ti_net_frcs(2,:)
> > > > > > >> >
> > > > > > >> > do i = 1, atm_cnt
> > > > > > >> > ! This matches how sander removes netfrcs in TI
> runs
> > > > > > >> > if (ti_lst(1,i) .ne. 0) then
> > > > > > >> > frc(:, i) = frc(:, i) - ti_net_frcs(1,:)
> > > > > > >> > else if (ti_lst(2,i) .ne. 0) then
> > > > > > >> > frc(:, i) = frc(:, i) - ti_net_frcs(2,:)
> > > > > > >> > else
> > > > > > >> > frc(:, i) = frc(:, i) - net_frcs(:)
> > > > > > >> > end if
> > > > > > >> > end do
> > > > > > >> > end if
> > > > > > >> > ! Any extra points must have their 0.d0 forces
> reset...
> > > > > > >> >
> > > > > > >> > if (numextra .gt. 0 .and. frameon .ne. 0) &
> > > > > > >> > call zero_extra_pnts_vec(frc, ep_frames,
> > > > > > >> > gbl_frame_cnt)
> > > > > > >> >
> > > > > > >> > end if
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > GTI path:
> > > > > > >> > while (pos < cSim.atoms) {
> > > > > > >> > PMEFloat fx = converter(pX[pos], ONEOVERFORCESCALE);
> > > > > > >> > PMEFloat fy = converter(pY[pos], ONEOVERFORCESCALE);
> > > > > > >> > PMEFloat fz = converter(pZ[pos], ONEOVERFORCESCALE);
> > > > > > >> > if (abs(fx) > small || abs(fy) > small || abs(fz) >
> > small) {
> > > > > > >> > pX[pos] -= nfX;
> > > > > > >> > pY[pos] -= nfY;
> > > > > > >> > pZ[pos] -= nfZ;
> > > > > > >> > }
> > > > > > >> > pos += increment;
> > > > > > >> > }
> > > > > > >> >
> > > > > > >> > Scott
> > > > > > >> > _______________________________________________
> > > > > > >> > AMBER-Developers mailing list AMBER-Developers.ambermd.org
> > > > > > >> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > > >> >
> > > > > > >> _______________________________________________
> > > > > > >> AMBER-Developers mailing list
> > > > > > >> AMBER-Developers.ambermd.org
> > > > > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > > >>
> > > > > > >>
> > > > > > >> _______________________________________________
> > > > > > >> AMBER-Developers mailing list
> > > > > > >> AMBER-Developers.ambermd.org
> > > > > > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > > >>
> > > > > > >
> > > > > > _______________________________________________
> > > > > > AMBER-Developers mailing list
> > > > > > AMBER-Developers.ambermd.org
> > > > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > AMBER-Developers mailing list
> > > > > > AMBER-Developers.ambermd.org
> > > > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > > >
> > > > > _______________________________________________
> > > > > AMBER-Developers mailing list
> > > > > AMBER-Developers.ambermd.org
> > > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > > >
> > > > _______________________________________________
> > > > AMBER-Developers mailing list
> > > > AMBER-Developers.ambermd.org
> > > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Jun 30 2020 - 10:00:05 PDT