Re: [AMBER-Developers] SSE2 / SSE3 settings no longer used for ifort?

From: Scott Brozell <>
Date: Mon, 26 Oct 2009 02:44:16 -0400


On Fri, Oct 23, 2009 at 04:45:19PM -0700, Ross Walker wrote:
> It seems that the config.h prepared by ./configure is no longer setting the
> SSE2 / SSE3 optimization flags for the Intel compiler. This can reduce
> performance by >30% in some cases. Does anyone know why this was turned off?

Changes were made in revision 1.25; and they were advertised
on amber-developers (in fact u replied!) and in the logs.
So the obvious question is how exactly did you invoke configure ?
When i do this
./configure -sse intel
i get
OCFLAGS=-O3 -ip -axS ...
FOPTFLAGS= -O3 -ip -axS ...

> Specifically this is things like:
> ifort -axWPS
> The problem is that different versions of the Intel compilers use different
> sets of flags. Once again an example of the patients running the asylum.
> However, we could set -fast as the default option which would turn on:
> -xP -O3 -ipo

fast does not activate those options on all platforms.
But fast seems like the best optimized-flags option because it does
activate different options on different platforms.
So I don't know why it has not been used in configure for a long time.
Of course, if we make this change now then we have to test it...

> The ipo is a problem since it means all the optimization is done at link
> stage and you lose all the benefits of parallel builds and each new make is
> very expensive. We could specify

IMO parallel building is irrelevant; our users will build once and
run often; so we should choose the options that produce the fastest
still-correct executables. (In addition, this quick make turnaround
is senseless speed imho; surely, u can contemplate the universe
in the extra minutes of a serial make - or maybe view a film on
one of the four monitors of your Bat-scope [ no doubt upgraded
to 16 wide screens by now... <*8+-])

> -axWPS
> but a lot of this is now changed and deprecated in ifort >11.0
> Hence should we have an intel9 and intel10 target which uses -O3 -ip -axWPS
> and an Intel11 target which uses -O3 -ip -axSSE4.2,SSE4.1,SSSE3,SSE3,SSE2

On Fri, Oct 23, 2009 at 05:18:25PM -0700, Ross Walker wrote:
> In principle if you use -ax... you get all the code paths so it will run on
> multiple chip versions. This should be perfectly good. In practice though
> you should be compiling specifically for each architecture you have these
> days since the idea of a global 'i386/x86' instruction set is long dead. :-(
> These days the acronyms include:
> x86,x86_64,ia64,mmx,sse,sse2,sse3,ssse3,sse4.1,sse4.2,3dnow and lots more...
> The issue is that these can make such a big performance improvement that we
> should definitely be including them if we can.

Now this is where the problems arise and why i commented in my
advertisement that this still needs work.
Before 1.25 we had
        ocflags="-O3 -ip -axN"
        foptflags="-ip -O3 -axP"
which is clearly outdated and inconsistent.
Now we get:
OCFLAGS=-O3 -ip -axS ...
FOPTFLAGS= -O3 -ip -axS ...
According to my reading of the intel 10 man pages, -axS should get
all the possible sse vectorizations. So the next question is - are you
saying that is incorrect ? Probably since you mention -axWPS.
That's an easy change to make, but it won't be all things on all
platforms; so we are back to -fast.
There is also the compiler version wrinkle as you mentioned.
If someone has specific recommendations on options then make them;
otherwise, i suggest we make the small -axS -> -axWPS change for
the ambertools release and then try fast which will give us months
to test drive it b4 amber11.

I feel the need - the need for speed,

ps check out the opening speed sequence in
Make it a Lynchian double feature:
Mr Pumpkin Head gives it two seeds up, but keep the kiddies
down in the cellar for safety !

AMBER-Developers mailing list
Received on Mon Oct 26 2009 - 00:00:03 PDT
Custom Search