amber-developers: dsum_tol and shake tolerance on energy conservation

From: Ross Walker <>
Date: Fri, 5 Sep 2008 09:01:56 -0700

This was a discussion between several of us that originally centered on
performance in parallel but it raises some interesting issues about energy
conservation in NVE in sander and pmemd as it relates to dsum_tol and tol.
We have default values of 10^-5 for dsum_tol and 10^-5 for tol (the shake

However, I ran some simple tests looking at energy consvervation in NVE. The
system is ACE ALA ALA NME solvated in 1207 waters. Granted this is a small
system and Sander may not make optimal choices for NFFT but it is a useful
test. I minimized and then heated and pressure equilibrated for 100ps.

I then ran the following:

Production MD NVE
   ntx=5, irest=1,
   ntc=2, ntf=2,
   ntpr=1000, ntwx=0,
   dt=0.002, cut=8.,
   ntt=0, ntb=1, ntp=0,

which gives the default for dsum_tol and tol, I then also tried setting
dsum_tol to 10^66, tol to 10^-6 and both to 10^-6. Granted this is a 2fs
time step with only an 8 angstrom cut off so the energy conservation won't
be stellar but I am amazed at what a difference changing tol and dsum_tol
makes. See attached plot.

Changing the shake tolerance I think everyone agrees does not present any
complicated issues, it perhaps affects performance slightly but very very
minor. Dsum_tol has no affect on performance although as you can see in the
treads below Bob had some concerns about changing this without adjusting
anything else where although in his latest email he thinks it may not be
such a worry.

So, given this should we consider changing the default value of tol and
dsum_tol for Amber 11? If so what should we set them to? 10-6 each or
perhaps 3*10-6 as Bob suggests. This will of course mean we have to update
the test cases and to reproduce amber10 numbers will require explicitly
changing these values back to the amber10 values in the mdin.


-----Original Message-----
From: Robert Duke []
Sent: Wednesday, September 03, 2008 9:41 AM
To: Ross Walker; 'Ken Merz'
Cc: 'Adrian Roitberg'; 'Carlos Simmerling'; 'Thomas Cheatham III'
Subject: Re: Performance slides?

Hi Ross,
Okay, last note for now on the dsum_tol stuff. I think the value you are
using is fine. For jac, there is improvement in Darden's ewald error
estimate over the range of 1.e-5 (default dsum_tol) to 1.e-6 (the value you
used). Now, the pme error estimate may or may not be such a good number to
base setting these params on, but it does have some basis. The actual delta

in energies that one gets as dsum_tol is made smaller is noticeable around
the default but quickly levels off as you go into the 1.e-6 range. For
something like factor ix, there is actually a rather steep descent in the
error in the range of dsum_tol = 3.e-6 (error drops an order of magnitude) ,

but the error then again gets worse, but not as bad as the default.
Overall, it may not be a bad idea to decrease the default to 3.e-6, perhaps
smaller. But one has to definitely keep in mind - as you decrease dsum_tol
more, at some point things actually do start getting worse due to the effect

on the reciprocal space calc. The interesting thing about dsum_tol - since
there are no performance impacts, there are effectively no tradeoffs, and in

theory we should just select a more ideal value; I do think Darden may have
missed the optimum by a little bit, but I have not looked at a large number
of systems.
Best Regards - Bob

----- Original Message -----
From: "Ross Walker" <>
To: "'Robert Duke'" <>; "'Ken Merz'" <>
Cc: "'Adrian Roitberg'" <>; "'Carlos Simmerling'"
<>; "'Thomas Cheatham III'" <>
Sent: Tuesday, September 02, 2008 4:58 PM
Subject: RE: Performance slides?

> Hi Bob,
>> I think you do have to be very careful dinking with dsum_tol - just
>> increasing this does improve the correctness of the direct sum (by
>> modifying
>> the ewald coefficient), but it does it at the cost of increased error in
> Sure but it seems that cranking up the shake tolerance and dsum_tol by an
> order of magnitude significantly improves energy conservation. So it
> doesn't
> look to me like there is a problem, sure there could be longer range
> subtle
> problems but just from the conservation of energy in NVE it looks to make
> a
> big improvement.
>> the
>> reciprocal sum (if beta gives you a smoother "tail" region, I believe it
>> does it with increased emphasis of the reciprocal sum near the atom). So
>> then you have to increase fft grid density and interpolation order to get
>> THAT number to be okay, and doing both of those things is very very
>> expensive. So while I have not checked recently about what is optimal
>> for
>> accuracy in the pme calc, I would be very very careful about just pushing
>> up
>> dsum_tol (this is based on old memories, and I have not dug out all the
>> notes I have, but I dinked with this stuff a lot once, trying to figure
>> out
>> better ways to get a certain level of accuracy and also improve
>> performance).
> If you have some data on this it would be great to see it. It doesn't seem
> hurt performance too much here, at least the NVE run is still faster than
> the ntt=1 NVT run although perhaps NVE could be faster if tweaked.
>> Also, when we float these benchmarks, I think it is
>> important
>> to note that the reason jac is so much faster is that we moved to a 2
>> fsec
>> timestep like everyone else, not that there were any fundamental changes
>> in the code.
> Sure, although to be honest the NAMD people and the SHAW people are not
> reporting the 'real' JAC benchmark when they report it anyway. Since both
> SHAW and NAMD use multiple timestepping with it and they use lower
> precision. Hence why I just want to produce some AMBER numbers for 'real'
> simulations that 'AMBER users' will want to do - and then people on the
> review panel can check what users state they get against these numbers,
> and
> also to highlight to people that sometimes just leaving these multi-core
> chips idle can actually give you better performance for a give SU cost
> than
> using all the cores even though you are always charged for all of them.
>> The "problem" with increased timestep size from a performance
>> perspective is that it reduces the number of steps between nonbonded pair
>> list builds, thereby shifting the importance of various pieces of code in
>> the overall performance equation.
> Sure, but isn't 2fs, shake, NVE, NVT or NPT what most people want to run?
> I
> have tried to make these close to what I would consider 'real world'
> simulation parameters. If anyone else wants to chip in please do. Can
> anyone
> justify 9 angstrom cutoff? Or a timestep of <2fs? - what do you all run?
> We should probably have a TIP4PEW test case here as well.
> All the best
> Ross
> /\
> \/
> |\oss Walker
> | Assistant Research Professor |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- |
> | | PGP Key available on request |
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.

Received on Sun Sep 07 2008 - 06:07:50 PDT
Custom Search