Re: [AMBER-Developers] PMEMD now built by default from Robert Duke on 2010-03-02 (Amber Developers Archive Mar 2010)

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 2 Mar 2010 17:35:11 -0500

Hi Ross,
I think there are two main factors that influenced optimization, just off
the top of my head: 1) RISC vs. CISC, and 2) amount of cache. Vectorization
is these days more of a CISC feature, I would think. At any rate, it is
probably harder to predict whether DIRFRC_NOVEC helps or hurts due to the
interplay of several factors (so there actually are pre-em64_t chips where
it clearly helps). I should try to get my hands on a nehalem to run my
tuning suite, and see what is really optimal these days. FFTW helps about
10% on a single processor, but the benefit is quickly lost in the noise past
2-4 processors. I generally don't bother with it because it is a pain to do
the compile/link (actually how I feel about netcdf and mkl (especially)
these days). FFTLOADBAL_2PROC is a once-in-a-while useful optimization, but
not critical; it allows assignment of all fft work to one of two processors,
followed by redistribution as necessary. It is one of those things where it
is hard to predict the benefit due to interplay between interconnect speed
vs. caching effects in the fft code. I may be incorrect in assuming this,
but I do assume most folks run on more than 2 processors these days, so it
is not a big deal whether we make it a default or not.
Regards - Bob
----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: "'AMBER Developers Mailing List'" <amber-developers.ambermd.org>
Sent: Tuesday, March 02, 2010 5:02 PM
Subject: RE: [AMBER-Developers] PMEMD now built by default

> Hi Bob,
>
>> A few quick notes -
>> The DIRFRC_* defines are important; you need to at least default to
>> what is
>> being used for em64t on these. A good example is DIRFRC_EFS which
>> enables
>> direct force calc splining optimizations that give you single processor
>> speedups in the range of 30%. The other stuff is in the 5-10% range
>> typically, and if you get the em64t defaults, you probably won't be far
>> off
>> target.
>
> Indeed. Right now when you issue ./configure intel
>
> You get:
>
> PMEMD_FPP=cpp -traditional -P -DBINTRAJ -DDIRFRC_EFS -DDIRFRC_COMTRANS
> -DDIRFRC_NOVEC -DFFTLOADBAL_2PROC -DPUBFFT
> PMEMD_F90=ifort
> PMEMD_FOPTFLAGS=-fast
>
> Which uses the correct ifdefs I think. I know there are things like FFTW
> but
> I never saw much speedup with that so figured it wasn't worth the hassle.
> The one that seems strange is the FFTLOADBAL_2PROC. This only does
> anything
> if you are running mpirun -np 2. Yes? Is there any harm in having this
> there
> or should we consider stripping it out?
>
> I assume the vec version of DIRFRC_ is mainly aimed at RISC systems
> correct?
> Have you tried it of late on any EM64T systems, the ones with SSE5 etc?
> The
> -fast does vectorization and full interprocedural optimization at linking
> stage so the difference may not be so great now but was wondering if you
> had
> tried it with Nehalm systems?
>
>> But you do need to pick up this level of optimization that is
>> basically implemented through alternative code paths. The slow mpi
>> stuff is
>> pretty unimportant now, and more likely to cause problems because folks
>> include it for non-ethernet systems, so seeing that go away is good
>> (there
>> is a potential for small systems to run out of mpi buffer space
>> however,
>> which can cause fairly annoying system hangs; something to watch out
>> for).
>
> Yeah, I saw that a lot because people (including me) would use mpich for
> running just on our desktops. When these were dual core it made no
> difference but now with 8 or 12 or even 16 cores in a desktop it starts to
> hurt.
>
>> It is true that a lot of the alternative codepath complexity has to do
>> with
>> supporting various risc architectures and itanium, both of which have
>> lost
>> the wars. On netcdf, well, in reality, at least in my experience, I
>
> Yeah a real shame but cheapness is everything these days... :-( In the
> minds
> of the politicians they are extending the American ideal to supercomputers
> as well so it is stated that '...all flops are created equal'.
>
>> install would typically take. Gosh, I hope ibm power systems become
>> available again somewhere; they are the best machines available in my
>> opinion; I guess they are expensive too though.
>
> Yeah there are a couple of bright sparks with power 7 systems coming
> online
> but getting access to them will be hard and I doubt anyone can convince a
> university to buy one in place of cheap flop rich cluster these days. :-(
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Mar 02 2010 - 15:00:03 PST