Re: [AMBER-Developers] PMEMD now built by default

From: Robert Duke <>
Date: Tue, 2 Mar 2010 17:41:25 -0500

Oops - said it wrong. I have some pre-em64_t chips where vectorizing the
direct force routines (here actually meaning local array caching so code
vectorization will be more likely) is helpful; ie., using DIRFRC_NOVEC
- Bob
----- Original Message -----
From: "Robert Duke" <>
To: "AMBER Developers Mailing List" <>
Sent: Tuesday, March 02, 2010 5:35 PM
Subject: Re: [AMBER-Developers] PMEMD now built by default

> Hi Ross,
> I think there are two main factors that influenced optimization, just off
> the top of my head: 1) RISC vs. CISC, and 2) amount of cache.
> Vectorization is these days more of a CISC feature, I would think. At any
> rate, it is probably harder to predict whether DIRFRC_NOVEC helps or hurts
> due to the interplay of several factors (so there actually are pre-em64_t
> chips where it clearly helps). I should try to get my hands on a nehalem
> to run my tuning suite, and see what is really optimal these days. FFTW
> helps about 10% on a single processor, but the benefit is quickly lost in
> the noise past 2-4 processors. I generally don't bother with it because
> it is a pain to do the compile/link (actually how I feel about netcdf and
> mkl (especially) these days). FFTLOADBAL_2PROC is a once-in-a-while
> useful optimization, but not critical; it allows assignment of all fft
> work to one of two processors, followed by redistribution as necessary.
> It is one of those things where it is hard to predict the benefit due to
> interplay between interconnect speed vs. caching effects in the fft code.
> I may be incorrect in assuming this, but I do assume most folks run on
> more than 2 processors these days, so it is not a big deal whether we make
> it a default or not.
> Regards - Bob
> ----- Original Message -----
> From: "Ross Walker" <>
> To: "'AMBER Developers Mailing List'" <>
> Sent: Tuesday, March 02, 2010 5:02 PM
> Subject: RE: [AMBER-Developers] PMEMD now built by default
>> Hi Bob,
>>> A few quick notes -
>>> The DIRFRC_* defines are important; you need to at least default to
>>> what is
>>> being used for em64t on these. A good example is DIRFRC_EFS which
>>> enables
>>> direct force calc splining optimizations that give you single processor
>>> speedups in the range of 30%. The other stuff is in the 5-10% range
>>> typically, and if you get the em64t defaults, you probably won't be far
>>> off
>>> target.
>> Indeed. Right now when you issue ./configure intel
>> You get:
>> PMEMD_F90=ifort
>> Which uses the correct ifdefs I think. I know there are things like FFTW
>> but
>> I never saw much speedup with that so figured it wasn't worth the hassle.
>> The one that seems strange is the FFTLOADBAL_2PROC. This only does
>> anything
>> if you are running mpirun -np 2. Yes? Is there any harm in having this
>> there
>> or should we consider stripping it out?
>> I assume the vec version of DIRFRC_ is mainly aimed at RISC systems
>> correct?
>> Have you tried it of late on any EM64T systems, the ones with SSE5 etc?
>> The
>> -fast does vectorization and full interprocedural optimization at linking
>> stage so the difference may not be so great now but was wondering if you
>> had
>> tried it with Nehalm systems?
>>> But you do need to pick up this level of optimization that is
>>> basically implemented through alternative code paths. The slow mpi
>>> stuff is
>>> pretty unimportant now, and more likely to cause problems because folks
>>> include it for non-ethernet systems, so seeing that go away is good
>>> (there
>>> is a potential for small systems to run out of mpi buffer space
>>> however,
>>> which can cause fairly annoying system hangs; something to watch out
>>> for).
>> Yeah, I saw that a lot because people (including me) would use mpich for
>> running just on our desktops. When these were dual core it made no
>> difference but now with 8 or 12 or even 16 cores in a desktop it starts
>> to
>> hurt.
>>> It is true that a lot of the alternative codepath complexity has to do
>>> with
>>> supporting various risc architectures and itanium, both of which have
>>> lost
>>> the wars. On netcdf, well, in reality, at least in my experience, I
>> Yeah a real shame but cheapness is everything these days... :-( In the
>> minds
>> of the politicians they are extending the American ideal to
>> supercomputers
>> as well so it is stated that '...all flops are created equal'.
>>> install would typically take. Gosh, I hope ibm power systems become
>>> available again somewhere; they are the best machines available in my
>>> opinion; I guess they are expensive too though.
>> Yeah there are a couple of bright sparks with power 7 systems coming
>> online
>> but getting access to them will be hard and I doubt anyone can convince a
>> university to buy one in place of cheap flop rich cluster these days. :-(
>> All the best
>> Ross
>> /\
>> \/
>> |\oss Walker
>> | Assistant Research Professor |
>> | San Diego Supercomputer Center |
>> | Tel: +1 858 822 0854 | EMail:- |
>> | | |
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>> not
>> be read every day, and should not be used for urgent or sensitive issues.
>> _______________________________________________
>> AMBER-Developers mailing list
> _______________________________________________
> AMBER-Developers mailing list

AMBER-Developers mailing list
Received on Tue Mar 02 2010 - 15:00:04 PST
Custom Search