amber-developers: pmemd with extra points, early release code ready

From: Robert Duke <>
Date: Fri, 20 Apr 2007 16:47:52 -0400

Folks -
Okay, I am finally done adding extra points support into pmemd. This has
only been tested for the tip4p and tip5p water models so far. There are
some extra points residues in GAFF also, but I did not have something
readily available to test. There are also amino acid and nucleic acid
forcefields that have extra points, but these are so far only used with the
earlier amber polarizable forcefields. I would very much like to see
development of a nonpolarizable ff for aa's and na's with extra points
myself. Anyway, anything with extra points is in theory supported in the
uniprocessor code, but as I say, only the tip4p and tip5p residues have been
tested. For mpi code, extra points will currently only work (and should
otherwise fail with a message) if you have a scenario where a residue
containing extra points does not contain all the frame atoms that support
placement of the extra points. I can fix this once I get something to test,
but did not want to release this sort of code without thorough test. I have
so far also only tested tip4p and tip5p as neat solvent; having an actual
solute should of course work but I just have not built a model (the extra
points test cases we have in the distribution are sort of minimal).

Extra points code has been a bit problematic in a number of ways.
Everything I have done of course starts with Tom Darden's code. The biggest
ugly feature in terms of the infrastructure is that there never really was
decent extra points support in the prmtop and leap; a very kludgy model was
used in terms of the prmtop input, and Tom had to write a ton of code to
postprocess the extra points stuff to fit into the default extra points
model. By the way, the default model is really the only model that makes
any sense to me; the "frameon" and "chngmask" &ewald namelist variables
should not be dinked with, as changing the defaults does not make much sense
and is totally untested; I would strongly advocate removal of these namelist
variables in amber 10.

Now, do you get the exact same results with pmemd and sander for extra
points. Yes, with a couple of stipulations. First of all, this is only
true if netfrc=0 (in &ewald), which says that the reciprocal force net force
correction will not be done. If you do use netfrc=1 (the default) you will
get ever so slightly different results between sander and pmemd. This is
"by design". The way extra points + netfrc=1 was done in all versions of
sander so far is actually slightly inconsistent, and slightly wrong (a very
small effect, but still wrong). I have passed this by Tom D.; we should
change sander to match pmemd. Secondly, if you output velocities, sander
and pmemd will differ for the extra point "atoms". This is also by design.
The velocity values for the extra point "atoms" that are output in sander
are basically bogus values, and just how bogus they are depends on whether
shake is being used. Because it would take real work to get the values
correct, and because the value for the velocity of an extra point is not
particularly useful (they are massless, after all, so don't contribute to
KE), the decision was made (Tom D., Dave C., me) to just zero any ep
velocities on output. This should also be done in sander so the mdvel files
match (and we should probably make a note somewhere to prevent user

Another difference between sander and this version of pmemd. In doing this
work, I discovered that if nrespa .gt. 1 (ie., respa is being used), then
netfrc corrections are not done properly. Once again, the error is small
but noticeable. I have fixed this in this release of pmemd. We should
probably issue patches form pmemd 9/sander 9. This fix has not been
commented on by Darden as of yet but Case agrees with me.

Why did I find all this stuff? Mostly because I was making changes that
influenced scalability and noticed the inconsistencies and errors because I
did things a bit differently for performance reasons.

Okay, final point on code functionality. This code also has full pmemd
amoeba support (ie., parallel support). Amoeba, as many have undoubtedly
heard me whine, is basically slug slow, with not much hope of getting a lot
faster. The pmemd version will scale out to about 20 processors though, and
makes an untenable situation barely tenable in terms of using amoeba. Once
again you will observe differences in the sander and pmemd outputs for
amoeba runs; the pmemd runs are all "by design", fixing various things that
made no sense in the sander 9 amoeba release (more stuff to fix in sander).
To get amoeba functionality, you should stick -DAMOEBA into the F90_FLAGS
line in your config.h file when you build - there is as of yet no automatic
configuration support for this, and the amoeba code is so slow to compile
and whacks the performance of other code sufficiently that I always release
a special amoeba executable when one is needed, building without amoeba for
pme and gb.

Okay, where to get the tarball? Well, send me email and I will email you
back a copy. That way I know who has it.

Performance. It is not bad, all considering. I have not compared it to
tip3p, but since you have 33% more "atoms" in the solvent, one may expect
you will have to build models with more "atoms" and due to various jinxing
around required to support extra points, the code is probably a bit slower.
Nonetheless, it scales reasonably well I built a simple little neat tip4p
solvent box containing 3902 tip4pew waters, or 15608 "atoms" - this is
basically 2/3 the size of JAC, and that means you will top out on scaling
below 128 proc for all but the best hardware. So I did quick benching of
pmemd and sander on our new incarnation of topsail at UNC - an infiniband -
xeon cluster. Okay - sad news. Clusters all over the place are getting
"upgraded" by replacing 3.6 GHz xeon dual cpu's with 2.4 GHz quad core dual
cpu's. Well, these have more cache and I presume faster transcendentals
because GB and things like GAMESS are faster (and pmemd looks better if you
use < 8 cpu's, thereby increasing cache and memory bandwidth). However,
these new machines are clocking at 2/3 of the previous speed, and very
little has been done to improve interconnect bandwidth. So with pmemd on
the "new improved" topsail, I see pme performance that is about 80-85% of
what it used to be for factor ix. PMEMD mostly does addition, subtraction,
multiplication, memory moves, and that's about it, so clock rate matters.
Now actually, for small systems like JAC, the performance is closer to 95%
of previous performance; we here at UNC are more apt to run something 5-10x
larger than JAC though. Well at least it is not just 67%, and there are now
4x as many processors. Still, your topend throughput on a job will go
down - nothing to be done for it (compile ifort on these machines with -xP
instead of -xW). I did not get non-power of two processor count results for
sander because the mpi implementation had the allgather problem (could have
compiled differently, but didn't know at the time, and it does not really
matter). Scaling is done with 8 processors considered as 100% scaling -
only fair way to do it on nodes with shared components.

Okay, the numbers - for 3902 tip4pew waters (15608 "atoms"), 300K, NTP, 8
angstrom cut, shake H, 1 fs time step, trajectory every psec:

#proc pmemd pmemd sander sander
          nsec/day scaling(%) nsec/day scaling(%)

  8 3.15 100 2.08 100
 16 5.61 89 2.71 65
 24 8.31 88 nd -
 32 10.41 83 2.92 35
 48 13.71 73 nd -
 64 13.94 55 2.51 15
 80 14.40 46 nd -
 96 15.71 42 nd -

Not bad, considering that JAC (standard - 1 fsec, NVE, 9 angstrom cut, shake
H, no trajectory, but 10K steps) does the following for pmemd now on

#proc nsec/day
  32 6.13
  64 9.29
  96 10.16
 128 10.9

Regards - Bob
Received on Sun Apr 22 2007 - 06:07:47 PDT
Custom Search