PPS the future* would appear to be more cores of approximately the same
computational power as Ampere, not the same number of cores but beefier. As
such, we need to figure out how to distribute the same basis set of
calculations across more cores going forward. Doubly so now that we have
NVLINK, an interconnect that makes multi-GPU not suck.
*A prediction pulled entirely from my Easter Bonnet(tm) based on the
progression from SM 5 and SM 8 and which should not at all be construed as
insider information because it's not.
On Sun, Apr 4, 2021 at 12:50 PM Scott Le Grand <varelse2005.gmail.com>
wrote:
> PS I'm killing off both TI paths and writing the path I wanted to write in
> the first place that both exploits the original Uber-kernels and Taisung's
> multi-streaming variant incorporating Darren and Taisung's improvements in
> the science whilst doing so. After those six impossible things or so,
> breakfast...
>
> On Sun, Apr 4, 2021 at 12:46 PM Scott Le Grand <varelse2005.gmail.com>
> wrote:
>
>> 1) Because that's the benchmark we've used since day one. Apples to
>> apples and all that. It's a relatively small system for single GPU which is
>> the perfect stand-in for large system multi-GPU efficiency. My goal is 4x
>> scaling on 8 GPUs with a positive scaling experience beyond that relaxing
>> system size limits up to 1B atoms in the process. If JAC gets faster, we
>> can scale farther.
>> 2) Because the path to AMBER 20 broke multiple implicit assumptions in my
>> design* for AMBER so I went back in time to change the future. All relevant
>> functionality will be restored over time, but I spent 6 months of 2020
>> trying to do exactly that before throwing my arms up in utter frustration.
>> The alternative is walking away from all the code and starting a new
>> framework.
>> 3) RTX3090**
>> 4) Remember this is PMEMD 2.0 we're building here. It's been almost 12
>> years, it's time to rewrite.
>>
>> But... Your local force code still shows an acceleration over the
>> original local force code even at full 64-bit accumulation. So that's
>> getting refactored along the way. Everything else so far ia a perf
>> regression without the precision model changes alas. But... you and I have
>> accidentally created working variant of SPXP. Your stuff will live again in
>> its reviva and you get first authorship IMO because while it's great work,
>> it's *not* SPFP with those precision changes in place (18-bit mantissa?
>> C'mon man(tm)...)
>>
>> *Should have spelled them out, but even I couldn't predict the end of the
>> CUDA Fellow program a priori ending any support for further work, but now
>> my bosses have let we work on it again as my dayjob.
>> **
>> https://www.exxactcorp.com/blog/Molecular-Dynamics/rtx3090-benchmarks-for-hpc-amber-a100-vs-rtx3080-vs-2080ti-vs-rtx6000
>>
>> On Sun, Apr 4, 2021 at 12:03 PM David Cerutti <dscerutti.gmail.com>
>> wrote:
>>
>>> "Meanwhile, AMBER16 refactored to SM 7 and beyond is already hitting 730
>>> ns/day on JAC NVE 2 fs. AMBER20 with the grid interpolation and local
>>> force
>>> precision sub FP32 force hacks removed hits 572 ns/day (down from 632 if
>>> left in as we shipped it). That puts me nearly 1/3 to my goal of doubling
>>> overall AMBER performance which is what is important to me and where I'm
>>> going to focus my efforts..."
>>>
>>> Please explain here.
>>> 1.) Why are we back to using the old JAC NVE 2fs benchmark? The new
>>> benchmarks were redesigned several years ago to make more uniform tests
>>> and
>>> take settings that standard practitioners are now using.
>>> 2.) Why is Amber16 being refactored rather than Amber20?
>>> 3.) What does it mean to be hitting 730 ns/day? What card is being
>>> compared here--the Amber20 benchmarks look like they could be a V100,
>>> Titan-V, or perhaps an RTX-2080Ti.
>>>
>>>
>>> On Sun, Apr 4, 2021 at 12:11 PM Scott Le Grand <varelse2005.gmail.com>
>>> wrote:
>>>
>>> > But getting back on topic, CUDA 7.5 is a 2015 toolkit and SM 5.x and
>>> below
>>> > are deprecated now. SM 6 is a huge jump over SM 5 enabling true virtual
>>> > memory and I suggest deprecating support for SM 5 across the board. SM
>>> 7
>>> > and beyond alas mostly complicated warp programming and introduced
>>> tensor
>>> > cores which currently seem useless for straight MD, but perfect for
>>> running
>>> > AI models inline with MD.
>>> >
>>> > CUDA 8 is a 2017 toolkit. That's way too soon to deprecate IMO and if
>>> cmake
>>> > has ish with it, that's a reason not to use cmake, not a reason to
>>> > deprecate CUDA 8.
>>> >
>>> >
>>> > On Sun, Apr 4, 2021 at 8:55 AM Scott Le Grand <varelse2005.gmail.com>
>>> > wrote:
>>> >
>>> > > Ross sent me two screenshots of cmake losing its mind with an 11.x
>>> > > toolkit. I'll file an issue, but no, I'm not going to fix cmake
>>> issues
>>> > > myself at all. I'm open to someone convincing me cmake is better
>>> than the
>>> > > configure script, but no one has made that argument yet beyond
>>> "because
>>> > > cmake" and until that happens, that just doesn't work for me. Happy
>>> to
>>> > > continue helping with the build script that worked until convinced
>>> > > otherwise. Related: I still use nvprof, fight me.
>>> > >
>>> > > Meanwhile, AMBER16 refactored to SM 7 and beyond is already hitting
>>> 730
>>> > > ns/day on JAC NVE 2 fs. AMBER20 with the grid interpolation and local
>>> > force
>>> > > precision sub FP32 force hacks removed hits 572 ns/day (down from
>>> 632 if
>>> > > left in as we shipped it). That puts me nearly 1/3 to my goal of
>>> doubling
>>> > > overall AMBER performance which is what is important to me and where
>>> I'm
>>> > > going to focus my efforts as opposed to the new shiny build system
>>> that
>>> > is
>>> > > getting better (and I *hate* cmake for cmake's sake), but we rushed
>>> it to
>>> > > production IMO like America reopened before the end of the pandemic.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Sun, Apr 4, 2021 at 5:51 AM David A Case <david.case.rutgers.edu>
>>> > > wrote:
>>> > >
>>> > >> On Sat, Apr 03, 2021, Scott Le Grand wrote:
>>> > >>
>>> > >> >cmake is still not quite ready for prime time disruption of
>>> configure.
>>> > >> It's
>>> > >> >getting there though.
>>> > >>
>>> > >> If there are problems with cmake, please create an issue on gitlab,
>>> and
>>> > >> mention .multiplemonomials to get Jamie's attention. Please try to
>>> > avoid
>>> > >> the syndrome of saying "I can get this to work with configure, and
>>> I'm
>>> > to
>>> > >> busy right now to do anything else."
>>> > >>
>>> > >> I have removed the documentation for the configure process in the
>>> > Amber21
>>> > >> Reference Manual, although the files are still present. We can't
>>> > continue
>>> > >> to support and test two separate build systems, each with their own
>>> > bugs.
>>> > >>
>>> > >> ...thx...dac
>>> > >>
>>> > >>
>>> > >> _______________________________________________
>>> > >> AMBER-Developers mailing list
>>> > >> AMBER-Developers.ambermd.org
>>> > >> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>> > >>
>>> > >
>>> > _______________________________________________
>>> > AMBER-Developers mailing list
>>> > AMBER-Developers.ambermd.org
>>> > http://lists.ambermd.org/mailman/listinfo/amber-developers
>>> >
>>> _______________________________________________
>>> AMBER-Developers mailing list
>>> AMBER-Developers.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>
>>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Apr 04 2021 - 13:00:03 PDT