Re: [AMBER-Developers] Desmond SUX

From: Gustavo Seabra <gustavo.seabra.gmail.com> Date: Fri, 14 May 2021 10:16:36 -0400

--
Gustavo Seabra.
On Thu, May 13, 2021 at 6:03 PM David Cerutti <dscerutti.gmail.com> wrote:
> For reference, here are the benchmarks that I think people are talking
> about:
> Desmond-GPU Performance as of April 2021.pdf (deshawresearch.com)
> <
> https://www.deshawresearch.com/publications/Desmond-GPU%20Performance%20as%20of%20April%202021.pdf
> >
>
> Desmond uses a different long-ranged summation, the "U-series" which was a
> bit of a dance of the seven veils and then turned out to be very similar to
> other P3M techniques, SPME included.  The U-series was the way they go to
> 8fs between updates to the long-ranged component of the electrostatics.
> Regardless of what it is, though, I'll say that my own experiences in
> multiple time stepping (see mdgx GB) don't leave much room to go higher
> than 5fs in any component of the force.  Long ago, circa 2015 their DHFR
> benchmark was much faster than Amber (the 50% Scott is alluding to), which
> seems to be a ratio they have maintained over the years, but it's now more
> in line with the rest of the benchmarks--one can compute the number of
> atoms moved by the code in a given time and see that the STMV case is,
> indeed, moving substantially more than DHFR.  It's pretty impressive that
> they can do ten million atoms, but of course that's more of a stunt (I
> would have been more inclined to do eight virions in a cube).  That said,
> the Desmond folks do some pretty wild orchestration of how many fmafs and
> other arithmetic ops they can pull out of each cycle, so while their
> numbers may be tweaked according to any given standard my feeling is that
> "sales" are not a big incentive for them to cook the books.
>
> You can surely get more performance out of pmemd on the smaller systems if
> you multiply the systems it simulates at one time.  2300ns per day with
> DHFR on one of the top-end Ampere cards shouldn't be out of the question.
> This should be one of the highest priorities in any renovations to the
> engine, as most pharma outfits study problems of 25-50k atoms, must run
> many windows before getting a single answer, and always have more compounds
> to test than GPUs to do it.  What I would also suggest is that anything
> happening to pmemd's CUDA component is stuck behind some very old Fortran
> code, with Pieces of a System flying around in a manner that's almost as
> depressing as the film with Vanessa Kirby.  Rebuild the 100k lines of
> Fortran in C++ with accessible, well-engineered structs that are hard to
> break.  Topologies, coordinates, and simulation protocols can all be
> structs passed around and created or destroyed as needed by a protocol.
> Give them each pointer structs that can be copied to the GPU in a manner
> analogous to cSim today, or preferably as multiple, focused pointer structs
> that become kernel arguments when the actual kernel is launched (the
> long-ranged electrostatic kernel doesn't need to know about the bonded
> parameter constants, for example--a Prmtop struct can have multiple pointer
> substructures tailored for different parts of the force calculation).  Make
> the kernels for producing work units operate on arrays of such structs, so
> that a force kernel will seamlessly stride from one system to the next as
> it plays its part in any given time step.  You should const as much as
> possible but const auto may be something to use sparingly, so that new
> developers will become better immersed in the actual nuts and bolts of the
> code by seeing the actual data types.  That will give upcoming
> graduate students more to work with and help them to understand the CUDA
> code as something much more C / C++ -like.
>
> Don't gnash your teeth over what DE Shaw's guys have achieved.  The things
> that drive sales are utility and unique capabilities, two things that Amber
> has done pretty well with despite being the product of a handful of
> research groups who mostly prefer to see everyone staying in their
> respective lanes.  Standardize what a "topology" is and make a clean,
> efficient, extensible tool for creating systems.  That should be the first
> stop for anyone thinking of adding new components to the force field or a
> free energy protocol.  Document the hell out of everything.  Stop relying
> on one Bob, or Scott, or me, or Taisung, or Scott again to
> MakeABetterEngine.cu.  That needs to be a community activity, and it will
> improve the employment prospects of your students to have them involved in
> professional python / C++ / CUDA programming.  Be honest about your
> benchmarks and make a new section of the website as an exposition of
> Amber's free energy capabilities.  It shouldn't take five years for
> advertising that doesn't support the group interest to be taken off the
> website, or for a researcher with unique ideas and much stronger
> associations to the consortium to finally get priority over an
> undergraduate who left the group years earlier.  Even an academic
> organization with $350,000 annual revenue shouldn't continue to rely on a
> former member to donate his time and money just to keep their CI up and
> running, regardless of his generosity in doing so.  The DE Shaw Group is a
> professional organization of extremely talented, probably overworked
> individuals united by their goals of advancing molecular simulations.  Stop
> comparing the benchmarks unless you want to start comparing the
> organizations.
>
> Dave
>
>
> On Thu, May 13, 2021 at 4:48 PM Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
> > To me, it's a sales trick until they demonstrate numerical stability to
> the
> > level Ross and I did with SPFP and SPDP. Have they? But even if it's not
> > that stable, at least customers can make an informed choice with such
> data,
> > no? Also, how often are they rebuilding the neighbor list? Is it a fixed
> > interval like GROMACS or is there a skin test?
> >
> > I am rethinking all this currently and I have friends who think Neighbor
> > lists are obsolete if we move to higher timesteps and larger nonbond
> > cutoffs, but that brings us to how do we handle exclusions and that's a
> > rabbit hole. But... Coincidentally, SPFP's perfect force conservation can
> > let you add and subtract them if you cap their magnitudes or use some
> > variant of softcore to control dynamic range. But are they doing anything
> > like this? Details are everything!
> >
> > On Thu, May 13, 2021 at 1:39 PM Michael R Shirts <
> > Michael.Shirts.colorado.edu> wrote:
> >
> > > > and they skipped calculating the Ewald Sum every other iteration
> > (thanks
> > > Adrian!).
> > >
> > > In their semi-defense, IIRC, their default on all DESMOND simulations
> for
> > > a while has been to do multiple timestepping of forces, including Ewald
> > sum
> > > every other timestep. It's not entirely clear to me if this is
> > sufficiently
> > > accurate, and they definitely should make that clearer that they are
> > doing
> > > something different, but it's a valid approach (that more people should
> > be
> > > investigating!) and it's not just a sales trick.  Not that there aren't
> > > also sales tricks out there.
> > >
> > > Best,
> > > ~~~~~~~~~~~~~~~~
> > > Michael Shirts
> > > Associate Professor
> > > michael.shirts.colorado.edu
> > > http://www.colorado.edu/lab/shirtsgroup/
> > > Phone: (303) 735-7860
> > > Office: JSCBB C123
> > > Department of Chemical and Biological Engineering
> > > University of Colorado Boulder
> > >
> > >
> > > On 5/13/21, 1:27 PM, "Scott Le Grand" <varelse2005.gmail.com> wrote:
> > >
> > >     So, we're all getting our knickers in a bunch over an Apples to
> > Oranges
> > >     Desmond to AMBER performance comparison.
> > >
> > >     Please don't...
> > >
> > >     They cheated, because that's what they do to keep their investors
> > > happy.
> > >     They used a 32^3 grid, and they skipped calculating the Ewald Sum
> > every
> > >     other iteration (thanks Adrian!). Rather than get upset here, point
> > and
> > >     laugh at DE Shaw et al. that they are afraid to go head to head
> with
> > > AMBER,
> > >     and if they do (and they won't because they're chicken bawk bawk
> > > bawk), we
> > >     have the people to address that as well.
> > >
> > >     At our end, there's a ~50% or so performance deficit in AMBER 20 we
> > > need to
> > >     fix. I've already fixed 2/3 of that building PMEMD 2.0 (770 ns/day
> > > DHFR 2
> > >     fs already). Let them prance about with their greasy kids stuff
> > > desperate
> > >     approximations and cheats, SPFP remains performance and accuracy
> with
> > >     compromise and if they want to pick a fight with SPFP, make them do
> > the
> > >     work to demonstrate equivalent numerical stability (spoilers: they
> > > won't
> > >     because they can't but oh the bellyacheing and handwaving they are
> > > willing
> > >     to do, just watch).
> > >
> > >     Scott
> > >     _______________________________________________
> > >     AMBER-Developers mailing list
> > >     AMBER-Developers.ambermd.org
> > >     http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > > _______________________________________________
> > > AMBER-Developers mailing list
> > > AMBER-Developers.ambermd.org
> > > http://lists.ambermd.org/mailman/listinfo/amber-developers
> > >
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers