Hi Scott,
> Changes were made in revision 1.25; and they were advertised
> on amber-developers (in fact u replied!) and in the logs.
> So the obvious question is how exactly did you invoke configure ?
> When i do this
> ./configure -sse intel
> i get
> OCFLAGS=-O3 -ip -axS ...
> FOPTFLAGS= -O3 -ip -axS ...
Ah, I did not see this option, it is hidden down the list of flags. This
likely means the average users will not see it either. Perhaps we should
make SSE the default and have a -nosse flag to turn it off? There can't be
many machines left these days that don't support at least SSE2.
> > The ipo is a problem since it means all the optimization is done at
> link
> > stage and you lose all the benefits of parallel builds and each new
> make is
> > very expensive. We could specify
>
> IMO parallel building is irrelevant; our users will build once and
> run often; so we should choose the options that produce the fastest
> still-correct executables. (In addition, this quick make turnaround
> is senseless speed imho; surely, u can contemplate the universe
> in the extra minutes of a serial make - or maybe view a film on
> one of the four monitors of your Bat-scope [ no doubt upgraded
> to 16 wide screens by now... <*8+-])
Ha ha. Screens are so old school! I just have it lasered onto my retina
these days. The big problem with the -fast option is not actually the
parallel build but the fact that all optimization is deferred until link
time. This causes two things to happen. Firstly it makes the linking require
LOTS of memory. Secondly it means that editing just one file and running
make again takes ages at the link stage. Hence it is like you made clean
each time. Of course this does not affect users who we assume will compile
only once. However, anyone doing development will get very frustrated with
this and thus we will be using different compiler flags for development as
we use for release.
> Now this is where the problems arise and why i commented in my
> advertisement that this still needs work.
> Before 1.25 we had
> ocflags="-O3 -ip -axN"
> foptflags="-ip -O3 -axP"
> which is clearly outdated and inconsistent.
> Now we get:
> OCFLAGS=-O3 -ip -axS ...
> FOPTFLAGS= -O3 -ip -axS ...
> According to my reading of the intel 10 man pages, -axS should get
> all the possible sse vectorizations. So the next question is - are you
I'm not sure on this. I have the following from the 10.1.018 man pages:
-axS
Can generate specialized code paths using Intel(R) Streaming SIMD Extensions
4 (SSE4) Vectorizing Compiler and
Media Accelerators instructions for future Intel processors that support
the instruction set and it can optimize
for the architecture.
-axT
Can generate specialized code paths for SSSE3, SSE3, SSE2, and SSE
instructions for Intel processors, and it can
optimize for the Intel(R) Core(TM)2 Duo processor family.
etc.
Hence my read on this is that -axS only supports SSE4 which is the latest
Nehalam chips. It does not cover SSE3 and below. Hence you need all the
options which for ifort 10.1 would be. -axSTPW
However, the warning here is that compilers older than 10.0 do not support
the S option I believe. Additionally not all the sub options are supported
by all compiler versions. Then to make things more complicated in ifort 11.0
and 11.1 Intel has set the -axS... options to deprecated and now says you
need to use the more verbose SSE4.2,SSE4.1 options etc. I assume that come
ifort v12 they will remove the -axS... options entirely and hence things
will no longer build. Of course SSE4.2 etc does not work with pre 11.0
compilers and thus the -fast option has its appeal except for the fact that
it introduces additional issues. So maybe for now we should use -axWPS which
should work with most post 10.0 Intel compilers for the time being - maybe
including the T option as well. Then we can issue a patch if Intel breaks
this in 12.0.
> There is also the compiler version wrinkle as you mentioned.
> If someone has specific recommendations on options then make them;
> otherwise, i suggest we make the small -axS -> -axWPS change for
> the ambertools release and then try fast which will give us months
> to test drive it b4 amber11.
The only other thing is to try to detect the compiler version but this may
get horribly complicated. Asking the user to provide the compiler version to
configure is probably a recipe for disaster so if we do have specifics for
specific compiler versions it should probably be selected automatically.
All the best
Ross
/\
\/
|\oss Walker
| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
|
http://www.rosswalker.co.uk | PGP Key available on request |
Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Oct 26 2009 - 09:00:02 PDT