Hi,
On Fri, Oct 23, 2009 at 04:45:19PM -0700, Ross Walker wrote:
> It seems that the config.h prepared by ./configure is no longer setting the
> SSE2 / SSE3 optimization flags for the Intel compiler. This can reduce
> performance by >30% in some cases. Does anyone know why this was turned off?
Changes were made in revision 1.25; and they were advertised
on amber-developers (in fact u replied!) and in the logs.
So the obvious question is how exactly did you invoke configure ?
When i do this
./configure -sse intel
i get
OCFLAGS=-O3 -ip -axS ...
FOPTFLAGS= -O3 -ip -axS ...
> Specifically this is things like:
> ifort -axWPS
>
> The problem is that different versions of the Intel compilers use different
> sets of flags. Once again an example of the patients running the asylum.
> However, we could set -fast as the default option which would turn on:
> -xP -O3 -ipo
fast does not activate those options on all platforms.
But fast seems like the best optimized-flags option because it does
activate different options on different platforms.
So I don't know why it has not been used in configure for a long time.
Of course, if we make this change now then we have to test it...
> The ipo is a problem since it means all the optimization is done at link
> stage and you lose all the benefits of parallel builds and each new make is
> very expensive. We could specify
IMO parallel building is irrelevant; our users will build once and
run often; so we should choose the options that produce the fastest
still-correct executables. (In addition, this quick make turnaround
is senseless speed imho; surely, u can contemplate the universe
in the extra minutes of a serial make - or maybe view a film on
one of the four monitors of your Bat-scope [ no doubt upgraded
to 16 wide screens by now... <*8+-])
> -axWPS
> but a lot of this is now changed and deprecated in ifort >11.0
>
> Hence should we have an intel9 and intel10 target which uses -O3 -ip -axWPS
> and an Intel11 target which uses -O3 -ip -axSSE4.2,SSE4.1,SSSE3,SSE3,SSE2
On Fri, Oct 23, 2009 at 05:18:25PM -0700, Ross Walker wrote:
> In principle if you use -ax... you get all the code paths so it will run on
> multiple chip versions. This should be perfectly good. In practice though
> you should be compiling specifically for each architecture you have these
> days since the idea of a global 'i386/x86' instruction set is long dead. :-(
>
> These days the acronyms include:
> x86,x86_64,ia64,mmx,sse,sse2,sse3,ssse3,sse4.1,sse4.2,3dnow and lots more...
>
> The issue is that these can make such a big performance improvement that we
> should definitely be including them if we can.
Now this is where the problems arise and why i commented in my
advertisement that this still needs work.
Before 1.25 we had
ocflags="-O3 -ip -axN"
foptflags="-ip -O3 -axP"
which is clearly outdated and inconsistent.
Now we get:
OCFLAGS=-O3 -ip -axS ...
FOPTFLAGS= -O3 -ip -axS ...
According to my reading of the intel 10 man pages, -axS should get
all the possible sse vectorizations. So the next question is - are you
saying that is incorrect ? Probably since you mention -axWPS.
That's an easy change to make, but it won't be all things on all
platforms; so we are back to -fast.
There is also the compiler version wrinkle as you mentioned.
If someone has specific recommendations on options then make them;
otherwise, i suggest we make the small -axS -> -axWPS change for
the ambertools release and then try fast which will give us months
to test drive it b4 amber11.
I feel the need - the need for speed,
Scott
ps check out the opening speed sequence in
http://en.wikipedia.org/wiki/The_Hidden
Make it a Lynchian double feature:
http://www.icanseeyoumovie.com/
Mr Pumpkin Head gives it two seeds up, but keep the kiddies
down in the cellar for safety !
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Oct 26 2009 - 00:00:03 PDT