Hi Jason,
My point is that benchmark numbers where someone has hand tuned the options for a given machine such that only an expert user willing to spend days on end tweaking things to get those numbers are not typically very useful.
My personal opinion is that codes should either be auto tuning or should be designed to give close to optimum performance without needing one to tweak a whole bunch of options. This is where codes radically differ. AMBER GPU for example was designed to give close to optimum performance without needing any special settings. Gromacs on the other hand has literally hundreds of little parameters one can tune and rarely gives good performance out of the box.
It really comes down to what one is trying to do here. Is one trying to generate some benchmarks that will be of general use to 95+% of users, which for most users is 'what do I get out of the box', or is one trying to do some kind of hero run and demonstrate that with an epic amount of tuning that 'my code is faster than yours(tm)'. Personally I find the latter to be of limited use in helping users do better / more effective science.
My 0.02 btc
All the best
Ross
> On Feb 3, 2022, at 10:43, Jason Swails <jason.swails.gmail.com> wrote:
>
> On Thu, Feb 3, 2022 at 8:44 AM Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> And in there lies the real problem. IMHO benchmark numbers should be what
>> the average user will get out of the box on reasonably priced hardware
>> without tinkering. If you want numbers that are actually useful in the real
>> world I'd suggest benchmarking by finding a grad student or postdoc who is
>> using MD in their work but is not a developer. Give them a typical
>> workstation that can be purchased for <$6K and a list of PDB IDs to run.
>> Ask them to report back the performance they get with various codes.
>>
>> That will give you a real world benchmark that is actually useful.
>>
>
> for running MD codes on commodity workstations purchased for <$6K.
> Otherwise you'd need to run the benchmark on whatever hardware you actually
> plan on using for them to be actually useful. [1]
>
> You'll obviously get some performance variability when performance depends
> on so many things, but to declare that the only useful benchmarks are those
> that run under conditions known to be optimal for one code at the expense
> of another is disingenuous. [2] All benchmarks are useful [3] if you know
> how to read them and understand the caveats (I'd wager most people do to a
> reasonable extent). Performance differences that don't come close to an
> order of magnitude won't do all that much to move the needle. You aren't
> going to go from the us -> ms regime by running 50% faster.
>
> Dhruv, I'd be interested in seeing those benchmarks.
>
> Thanks,
> Jason
>
> [1] For narrow definitions of "useful"
> [2] Recall the only benchmarks we published pre-2012 were performed on
> supercomputers
> [3] In the sense that it allows you to make more informed choices
>
> --
> Jason M. Swails
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Feb 03 2022 - 08:00:03 PST