Thanks for reminding me about those compiler options. I don't think that
-msse4.1 does much for it, but at least I know now to be trying to
optimize for old P4s. I did a bit of other reading on this, and there are
a number of things that are in my loops which prevent effective
vectorization by the compiler. I have a lot of non-uniform memory access.
Some of this I can fix without too much trouble, but I'm not sure how
useful it would ultimately be, because my loops will still have to contain
things like the double-to-int conversion / random access spline table
lookup. Besides, I seem to be able to get good mileage out of manually
unrolling these loops to saturate the registers (and it seems it's still
not fully realized).
Dave
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sun Oct 23 2011 - 17:00:02 PDT