[AMBER-Developers] Please read: we need to look at pmemd.cuda pair list fidelity from David Cerutti on 2018-01-15 (Amber Developers Archive Jan 2018)

From: David Cerutti <dscerutti.gmail.com>
Date: Mon, 15 Jan 2018 22:11:15 -0500

OK peeps,

*TL;DR: if you have run any simulations using an octahedral box, please
help me out by running them again in the NVE ensemble using Amber16 as well
as the new SpeedBoostSM code. If you have any simulations in octahedral
boxes that did pressure equilibration in pmemd.cuda, run those again
starting from the low-density initial configuration with the new code in
SpeedBoostSM and see whether the code still allows you to complete the
run. If you have any simulations of small or thin boxes (only slightly
more than twice the cutoff in thickness, need not be octahedral but only
with pmemd.cuda) and ever saw anything suspicious with them, write to me
and tell me about it.*

Bad news. It's looking pretty certain that there is a bug in Amber that we
will have to deal with before the next release. It's not hard to patch,
but the bug goes deeper than pmemd.cuda and therefore will break some test
cases. What's happening is that, for non-orthogonal unit cells (and the
only case the tleap really knows how to construct is the truncated
octahedron, translating it into a parallelepiped with all angles at
109.47), sander and pmemd do not know how to properly gauge the distance
between the box faces, and therefore the number of hash cells to create
when building their pair lists. The problem is worse in pmemd.cuda,
because the way the code works means that it is faster to rebuild the pair
list as little as possible and the code not only miscalculates the number
of cells it should allocate but the width of each cell and therefore the
pair list margin that it has to work with. When the unit cell is not a
cube or a shoebox, all of the codes can create pair lists that expire well
before they expect, pmemd.cuda particularly so.

I have a version of the code that gets the plane separation right and
correctly calculates how much room the code has to work with, currently
committed in SpeedBoostSM *(commit
86a86e6e07f36f1d0de9b361ee801fefc5dfa46e)*. I would like to make a
separate version of the master branch that has only the changes needed for
the patch so we can test it that way, but I am having some trouble
compiling things from the master branch:

make[2]: Entering directory `/home/cerutti/amberPBC/AmberTools/src/sff'
...
prm.c: In function ‘rdparm’:
prm.c:769: error: ‘for’ loop initial declarations are only allowed in C99
mode
prm.c:769: note: use option -std=c99 or -std=gnu99 to compile your code
prm.c:886: error: ‘for’ loop initial declarations are only allowed in C99
mode
prm.c:1457: error: ‘for’ loop initial declarations are only allowed in C99
mode
prm.c:1472: error: ‘for’ loop initial declarations are only allowed in C99
mode
make[2]: *** [prm.o] Error 1

is what I get on one platform, and on another I get complaints about
typecasting in certain integers with the RISM code. We can deal with these
problems in time (Dave Case seems to have just noticed them from the
repository threads), but for now I think it's fine to have people download
the SpeedBoostSM branch and see whether their results change. The new
version WILL fail 15 of the test cases--this is a consequence of correcting
the pair list layout.

We need to do some fairly broad and lengthy testing of this problem to see
how widespread the damage is. I have some specific tests in mind, but for
now anyone who did lengthy simulations with octahedral cells is asked to
please help out by firing them off a second time with separate executables
compiled based on code in SpeedBoostSM. Run 'em a long time if possible.
Probably what we should be looking at are long-timescale NVE comparisons,
because the danger is that the pair list is incomplete for some or all
steps of the simulation. There are specific tests that I have in mind to
quantify how severe the problem is. I don't think that many simulations
are truly problematic, but proving this will take some additional coding
and carefully constructed experiments.

The fix will have to come in stages: the fix for pmemd.cuda is in hand,
because I know that code the best of the three. For pmemd Fortran code and
sander it will take perhaps an afternoon to track down the calculations and
patch them.

There is one other issue on my radar--I'm not sure if it is a bug or not,
but it is specific to pmemd.cuda and looks suspicious to me. It could
affect any small simulation boxes (shoeboxes or octahedral). This again
pertains to pair list counting but it is a separate issue which I will be
looking into once the cell dimensions bug is under control.

Dave
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Jan 15 2018 - 19:30:02 PST