Re: [AMBER-Developers] PMEMD now built by default

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 2 Mar 2010 12:53:21 -0500

Hi Ross -
A few quick notes -
The DIRFRC_* defines are important; you need to at least default to what is
being used for em64t on these. A good example is DIRFRC_EFS which enables
direct force calc splining optimizations that give you single processor
speedups in the range of 30%. The other stuff is in the 5-10% range
typically, and if you get the em64t defaults, you probably won't be far off
target. But you do need to pick up this level of optimization that is
basically implemented through alternative code paths. The slow mpi stuff is
pretty unimportant now, and more likely to cause problems because folks
include it for non-ethernet systems, so seeing that go away is good (there
is a potential for small systems to run out of mpi buffer space however,
which can cause fairly annoying system hangs; something to watch out for).
It is true that a lot of the alternative codepath complexity has to do with
supporting various risc architectures and itanium, both of which have lost
the wars. On netcdf, well, in reality, at least in my experience, I have
gone looking for it already installed at most of the big installations I
have dealt with. So you don't always have to build it anyway. But it is
true that sander doing replica exchange is something you can do on a big
installation to advantage. In the past there was active resistance to
supporting sander because it was so slow; probably less of an issue now
though. I have always found it very convenient to just fly onto a new
system and have pmemd up and running in 30 min instead of the hours the full
install would typically take. Gosh, I hope ibm power systems become
available again somewhere; they are the best machines available in my
opinion; I guess they are expensive too though.
Regards - Bob
----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: "'AMBER Developers Mailing List'" <amber-developers.ambermd.org>
Sent: Tuesday, March 02, 2010 12:25 PM
Subject: RE: [AMBER-Developers] PMEMD now built by default


> Hi Bob,
>
>> optimized
>> that this generic build is as fast as the tuned builds? In particular,
>> how
>> do you handle the issue of all the performance-related defines in
>> pmemd?
>> One thought I have had would be to include a define in this generic
>> build
>> that causes printout of the fact that it is a generic, basically not
>> super-optimized build.
>
> Well it isn't really generic and likely there will be no real difference
> in
> performance at least for most systems. The big issue is that most of the
> build scripts in PMEMD have not been kept up to date so I would bet people
> actually end up building versions that are slower. E.g. Athlon flags have
> been out of date for ages but people still use them for new AMD chips.
>
> Right now the build uses the optimization flags for Infiniband, ifort and
> em64t. Basically the majority, >95% I bet of machines people run AMBER on
> today are largely all the same regardless of which manufacturer they come
> from. Just looking at the TeraGrid machines for example:
>
> Abe = Intel Quad Core clovertown EM64T Infiniband
> Queen Bee = Intel Quad Core Clovertown EM64T Infiniband
> Lonestar = Intel Dual Core EM64T NetBurst Infiniband
> Steele = Intel Quad Core Nehalam EM64T Infiniband
> Ranger = AMD Quad Core Barcelona Infiniband
> Kraken = Cray XT5 AMD Hex Core CSTAR2
>
> So the only machine out of these that is actually 'different' in anyway
> from
> the default is Kraken and we have a build target in the configure script
> for
> XT5 although I haven't updated this for PMEMD yet. The scripts for this in
> the main pmemd directory are out of date for this anyway.
>
> So right now defaulting to the best optimization flags for EM64T chips and
> infiniband will run best on almost all the machines people will likely
> build
> on. Including most Apple machines which also use Intel Multicore EM64T
> chips
> now. Plus desktops etc which are mostly multicore these days. Thus the
> optimizations for 2 cores are not really needed anymore and the stuff for
> slow non-blocking MPI on Ethernet has been largely made redundant by the
> advent of multicore which has now made it hopelessly inadequate.
>
> As for the other stuff. There are very few IA64 machines left and most
> people who have them (Altix's for example) are in the process of retiring
> them or upgrading them to UV systems which will be Intel EM64T chips. The
> IA64 TeraGrid Myrinet clusters are gone. I don't know anyone running on
> large Altix systems anymore and there are no publicly available Power
> systems left. Blue Waters will be Power 7 but that will be allocated to a
> very small group who will optimize specifically.
>
> Most of the performance gains on the NSF and DOE machines right now pretty
> much come from tweaking the MPI run time options and node layout more than
> compilation options.
>
> Thus I think the actual impact on performance from building as part of
> AMBER's regular build will likely be minimal for most users and indeed
> will
> more likely increase exposure of PMEMD which is a good thing.
>
>> The other thought on the old framework - it was
>> basically there not only to allow a different level of effort on
>> optimization, but also because pmemd had to hit a whole bunch of real
>> supercomputer targets, and frequently these sites would just be
>> interested
>> in deploying pmemd, not the entire amber package on a given
>> supercomputer.
>> We still need to maintain that.
>
> Well I think requiring them to build sander and pmemd is reasonable.
> Besides
> they need to build netcdf anyway and once you do that you are halfway
> there
> to doing sander. One can build sander.MPI and pmemd.MPI fairly simply so I
> would encourage any people deploying on supercomputers to build both.
> There
> are still plenty of things one can do on big machines that pmemd doesn't
> currently support like replica exchange for example.
>
>> to do a bit of tweaking around some of these issues on the pmemd side?
>
> I would suggest taking a look at any quick tweaks that can be done to
> improve performance on multi (quad and hex core chips) running infiniband
> since this is the majority of what people will use. Whatever flags /
> ifdefs
> you come up with I can then just easily migrate to the AMBER configure
> build.
>
> I still have a few things to tweak myself as well, including finishing the
> GPU implementation and also finishing adding IPS which I have been doing
> locally. Thus if you plan to make any large changes please let me know so
> I
> can make sure everything still works together okay.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org/ |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>



_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Mar 02 2010 - 10:00:05 PST
Custom Search