Re: amber-developers: Pmemd vs sander from Robert Duke on 2003-11-16 (Amber Developers Archive Nov 2003)

From: Robert Duke <rduke.email.unc.edu>
Date: Sun, 16 Nov 2003 09:56:57 -0700

Yong -
Well, we all have a heck of a time distinguishing between roundoff error
and
something real.

In sander 7 mode, there are some real differences between pmemd and sander
7
in the calculation of virials, but I don't believe there are other
significant differences. In amber7 mode, pmemd retains the amber 6
molecular virials calculation algorithm, which treats crosslinked
molecules
differently (neither approach being necessarily more correct - it is a
result of changes in 7 associated with Tom D.'s extra points code). So
are
you comparing sander 7 to pmemd/amber7 mode? If so, there will definitely
be differences, rather quickly. Look at the differences between sander 6
and sander 7, by the way.

If not, then lets talk about levels of difference. I typically run
somewhere around 20 different tests (my test machine is not up at the
momemt), and typically expect identical results, within reason, for pmemd
vs. the appropriate sander for somewhere between 30 and 100 steps (a lot
less than 10 ps, or 10000 steps). I say within reason because for some
models you may have an energy value that is calculating out internally
very
near a point where a visible digit changes, so you get a 121.4561499999999
vs. a 121.456150000000 for example if you look at the number in more
precision. What happens in such a test is that given value will be
different, perhaps in step 1, and then match in step 10. I chose these
bounds (30-100 steps) from observations of what happens with sander
running
on different architectures and with different numbers of processors (ie.,
you start seeing differences after that even with the same piece of code).

In regard to mpi, what I observe for pmemd may have a significantly
observable, but not significant, difference from sander 6/7 in runs past
around 300 steps (still a heck of a lot less than 10000 steps). This is
due
to dynamic loadbalancing in pmemd. Under mpi, on systems with any other
load whatsoever (heck, just OS scheduling), you are virtually guaranteed
that two identically set up runs will not produce identical results for
more
than about 300 steps in pmemd. This is purely due to the impact of
rounding
errors when the workload is split between processors differently, and
pmemd
will for all intents and purposes never split the workload the same way
past
the first list build, because the workload split is dependent on relative
timings, and even 2 otherwise idle machines will have ever so slightly
different throughputs from one second to the next.

Also, in pmemd, depending on the version you are using, the axis
reorientation algorithm can produce different results really quickly in
unstable systems. In pmemd 3.00-3.03, axis reorientation was done under
mpi
but not in uniprocessor code. In 3.13, I changed the defaults to do axis
reorientation only under mpi if the box had an aspect ratio of 1.5 or
higher. Axis reorientation has a subtle effect on gridding and/or fft
calcs
in pme, and in fact could even affect the nfft values that are
auto-computed
(due to the eveness requirement in the first dimension of rcfft's). I
think
there are even more significant impacts in shake. I am a little unsure
about the shake code in general, after observing how different the results
can be if one simply reorients the box. I only see this in systems very
far
from equilibrium, though (vacuum bubbles, etc.).

SO, could there be bugs in pmemd that affect energy output past 300 steps?
You bet. Could there be bugs in sander 6/7 that affect energy output past
300 steps? You bet. I worry about all this stuff a lot with mpi, where
complete coverage of the atoms (ie., being sure that all the calcs for
each
atom occur once and only once) is an issue. I worry about listbuilds
picking up all atoms. I don't completely understand some of the sander 6
listbuilding code, but I seem to recollect that sander 6 may actually miss
putting some atoms at the edge of the skin on the list that I get (this
may
only be for nonorthogonal code). I have more faith in my code here than
the
sander 6 code. There could be memory overruns with subtle affects. When
I
first decided to start hacking on sander 6, I did it primarily due to a
bug
in sander 6 that I never did figure out. I was doing long runs on an sgi
dual processor workstation, and I would go for 250-500 psec and crash with
an obscure error in the pme fft code. Not reproducible at all. Some sort
of memory stomp, but given the "lets hack a huge piece of memory into lots
of little pieces" memory allocation scheme, I never was successful in
finding it. So I just started rewriting the code and the problem went
away.
Here I definitely have much more faith in my code, because using dynamic
memory allocation in f90 is more reliable, and can be debugged more
effectively. My point on both of these things is that I don't trust
sander
6 or 7, and with good reasons. The way we do software engineering, we
could
have subtle bugs, and we would never know it. I am including myself in
this. I think I have tried to develop pmemd to a higher level of quality,
but I am still very dissatisfied with it, because I have been in a rush,
it
derives from poorly documented and structured code, and the rewards
associated with doing a really excellent job of the engineering are not
there.

I guess the bottom line on all this is, if you see a difference, is it a
statistically significant difference in a large sample of runs, and is it
"large enough" to indicate a possible bug? So I think you have to do
dozens
of runs for each system and then do statistics. I suppose that on a
uniprocessor, this should not be the case; the math should be
reproducible.
Still, how do you know that the differences are significant, and not just
attributable to rounding errors, different implementations of sqrt(),
sin(),
exp() or what have you? Sander 6 uses 4 byte constants in its wrapping
algorithms; sander 7 uses 8 byte constants. Pmemd is sander 6 in this
regard; you see differences real quick if wrapping. Is it significant?
IF
there is a bug that produces a difference, why do you think it is in
pmemd?
(;-)). I can give you two answers on this one myself, in two different
directions. 1) The bugs are more apt to be in pmemd, because it does more
complicated things to get better scaling. The code complexity associated
with all this workload redistribution and spatial optimization is
significant. 2) The bugs are less apt to be in pmemd, because it is
written
in f90 in modules, with all the engineering reliability benefits that
pertain - like interface checking and dynamic memory allocation.

I think several doctoral degrees could be done on all this stuff, and the
field would benefit. I don't think any of us really know what we are
doing
in this area, and that most folks have way too much confidence in their
own
code, as well as the code of others. I know that is pessimistic. I have
a
lot of interest in the whole area of error analysis, starting with the
force
fields themselves down through the algorithms that generate the energetics
from the forcefields. I am also very interested in driving the code base
toward faster, more reliable algorithms that we can have more confidence
in.
In the meantime, in looking at the differences between pmemd and sander
6/7,
ask yourself "is it significant?". And in doing this, look at the
differences between sander 6, 7, pmemd, gromacs, namd, commercial product
x,
.. If there are problems, I do want to know about them and fix them.

Regards - Bob

----- Original Message -----
From: "Yong Duan" <yduan.udel.edu>
To: <amber-developers.scripps.edu>
Sent: Sunday, November 16, 2003 12:39 AM
Subject: amber-developers: Pmemd vs sander

>
> This is probably better addressed to bob, but I think may have general
> insterets.
>
> Although the first step energies of sander and pmemd agree to the last
> digit, after a while (within 10ps) the energies of the two may start to
> show difference withi a notable change in the energies by pmemd. The
> pmemd energies stabilize at a new level which is not too different from
> the one by sander, but nevertheless noticeable (about the level of
> fluctuation). Has anybody else noticed it?
>
> Secondly, for some reason, I keep getting "FLUSH--FAILED:: Resource
> temporarily unavailable" from sander on Xeon/mpich/ifc/static
> combination.
>
> yong
>
>
>
>
Received on Wed Apr 05 2006 - 23:50:07 PDT