Re: [AMBER-Developers] Desmond SUX

From: David Cerutti <dscerutti.gmail.com>
Date: Thu, 13 May 2021 18:02:26 -0400

For reference, here are the benchmarks that I think people are talking
about:
Desmond-GPU Performance as of April 2021.pdf (deshawresearch.com)
<https://www.deshawresearch.com/publications/Desmond-GPU%20Performance%20as%20of%20April%202021.pdf>

Desmond uses a different long-ranged summation, the "U-series" which was a
bit of a dance of the seven veils and then turned out to be very similar to
other P3M techniques, SPME included. The U-series was the way they go to
8fs between updates to the long-ranged component of the electrostatics.
Regardless of what it is, though, I'll say that my own experiences in
multiple time stepping (see mdgx GB) don't leave much room to go higher
than 5fs in any component of the force. Long ago, circa 2015 their DHFR
benchmark was much faster than Amber (the 50% Scott is alluding to), which
seems to be a ratio they have maintained over the years, but it's now more
in line with the rest of the benchmarks--one can compute the number of
atoms moved by the code in a given time and see that the STMV case is,
indeed, moving substantially more than DHFR. It's pretty impressive that
they can do ten million atoms, but of course that's more of a stunt (I
would have been more inclined to do eight virions in a cube). That said,
the Desmond folks do some pretty wild orchestration of how many fmafs and
other arithmetic ops they can pull out of each cycle, so while their
numbers may be tweaked according to any given standard my feeling is that
"sales" are not a big incentive for them to cook the books.

You can surely get more performance out of pmemd on the smaller systems if
you multiply the systems it simulates at one time. 2300ns per day with
DHFR on one of the top-end Ampere cards shouldn't be out of the question.
This should be one of the highest priorities in any renovations to the
engine, as most pharma outfits study problems of 25-50k atoms, must run
many windows before getting a single answer, and always have more compounds
to test than GPUs to do it. What I would also suggest is that anything
happening to pmemd's CUDA component is stuck behind some very old Fortran
code, with Pieces of a System flying around in a manner that's almost as
depressing as the film with Vanessa Kirby. Rebuild the 100k lines of
Fortran in C++ with accessible, well-engineered structs that are hard to
break. Topologies, coordinates, and simulation protocols can all be
structs passed around and created or destroyed as needed by a protocol.
Give them each pointer structs that can be copied to the GPU in a manner
analogous to cSim today, or preferably as multiple, focused pointer structs
that become kernel arguments when the actual kernel is launched (the
long-ranged electrostatic kernel doesn't need to know about the bonded
parameter constants, for example--a Prmtop struct can have multiple pointer
substructures tailored for different parts of the force calculation). Make
the kernels for producing work units operate on arrays of such structs, so
that a force kernel will seamlessly stride from one system to the next as
it plays its part in any given time step. You should const as much as
possible but const auto may be something to use sparingly, so that new
developers will become better immersed in the actual nuts and bolts of the
code by seeing the actual data types. That will give upcoming
graduate students more to work with and help them to understand the CUDA
code as something much more C / C++ -like.

Don't gnash your teeth over what DE Shaw's guys have achieved. The things
that drive sales are utility and unique capabilities, two things that Amber
has done pretty well with despite being the product of a handful of
research groups who mostly prefer to see everyone staying in their
respective lanes. Standardize what a "topology" is and make a clean,
efficient, extensible tool for creating systems. That should be the first
stop for anyone thinking of adding new components to the force field or a
free energy protocol. Document the hell out of everything. Stop relying
on one Bob, or Scott, or me, or Taisung, or Scott again to
MakeABetterEngine.cu. That needs to be a community activity, and it will
improve the employment prospects of your students to have them involved in
professional python / C++ / CUDA programming. Be honest about your
benchmarks and make a new section of the website as an exposition of
Amber's free energy capabilities. It shouldn't take five years for
advertising that doesn't support the group interest to be taken off the
website, or for a researcher with unique ideas and much stronger
associations to the consortium to finally get priority over an
undergraduate who left the group years earlier. Even an academic
organization with $350,000 annual revenue shouldn't continue to rely on a
former member to donate his time and money just to keep their CI up and
running, regardless of his generosity in doing so. The DE Shaw Group is a
professional organization of extremely talented, probably overworked
individuals united by their goals of advancing molecular simulations. Stop
comparing the benchmarks unless you want to start comparing the
organizations.

Dave


On Thu, May 13, 2021 at 4:48 PM Scott Le Grand <varelse2005.gmail.com>
wrote:

> To me, it's a sales trick until they demonstrate numerical stability to the
> level Ross and I did with SPFP and SPDP. Have they? But even if it's not
> that stable, at least customers can make an informed choice with such data,
> no? Also, how often are they rebuilding the neighbor list? Is it a fixed
> interval like GROMACS or is there a skin test?
>
> I am rethinking all this currently and I have friends who think Neighbor
> lists are obsolete if we move to higher timesteps and larger nonbond
> cutoffs, but that brings us to how do we handle exclusions and that's a
> rabbit hole. But... Coincidentally, SPFP's perfect force conservation can
> let you add and subtract them if you cap their magnitudes or use some
> variant of softcore to control dynamic range. But are they doing anything
> like this? Details are everything!
>
> On Thu, May 13, 2021 at 1:39 PM Michael R Shirts <
> Michael.Shirts.colorado.edu> wrote:
>
> > > and they skipped calculating the Ewald Sum every other iteration
> (thanks
> > Adrian!).
> >
> > In their semi-defense, IIRC, their default on all DESMOND simulations for
> > a while has been to do multiple timestepping of forces, including Ewald
> sum
> > every other timestep. It's not entirely clear to me if this is
> sufficiently
> > accurate, and they definitely should make that clearer that they are
> doing
> > something different, but it's a valid approach (that more people should
> be
> > investigating!) and it's not just a sales trick. Not that there aren't
> > also sales tricks out there.
> >
> > Best,
> > ~~~~~~~~~~~~~~~~
> > Michael Shirts
> > Associate Professor
> > michael.shirts.colorado.edu
> > http://www.colorado.edu/lab/shirtsgroup/
> > Phone: (303) 735-7860
> > Office: JSCBB C123
> > Department of Chemical and Biological Engineering
> > University of Colorado Boulder
> >
> >
> > On 5/13/21, 1:27 PM, "Scott Le Grand" <varelse2005.gmail.com> wrote:
> >
> > So, we're all getting our knickers in a bunch over an Apples to
> Oranges
> > Desmond to AMBER performance comparison.
> >
> > Please don't...
> >
> > They cheated, because that's what they do to keep their investors
> > happy.
> > They used a 32^3 grid, and they skipped calculating the Ewald Sum
> every
> > other iteration (thanks Adrian!). Rather than get upset here, point
> and
> > laugh at DE Shaw et al. that they are afraid to go head to head with
> > AMBER,
> > and if they do (and they won't because they're chicken bawk bawk
> > bawk), we
> > have the people to address that as well.
> >
> > At our end, there's a ~50% or so performance deficit in AMBER 20 we
> > need to
> > fix. I've already fixed 2/3 of that building PMEMD 2.0 (770 ns/day
> > DHFR 2
> > fs already). Let them prance about with their greasy kids stuff
> > desperate
> > approximations and cheats, SPFP remains performance and accuracy with
> > compromise and if they want to pick a fight with SPFP, make them do
> the
> > work to demonstrate equivalent numerical stability (spoilers: they
> > won't
> > because they can't but oh the bellyacheing and handwaving they are
> > willing
> > to do, just watch).
> >
> > Scott
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu May 13 2021 - 15:30:02 PDT
Custom Search