Re: [AMBER-Developers] Please read: we need to look at pmemd.cuda pair list fidelity

From: David Cerutti <>
Date: Tue, 16 Jan 2018 01:00:24 -0500

Hi Chris,

If you look in the non-bonded loop kernel kNLCPNE.h, on or about lines
564-587 you will find the following code:

      // Translate all atoms into a local coordinate system within one unit
      // cell of the first atom read to avoid PBC handling within inner
      int cell = shAtom.ID & NLATOM_CELL_TYPE_MASK;
#if defined(PME_VIRIAL) && defined(PME_IS_ORTHOGONAL)
      shAtom.x += sUcellf[0] * cSim.cellOffset[cell][0];
      shAtom.y += sUcellf[4] * cSim.cellOffset[cell][1];
      shAtom.z += sUcellf[8] * cSim.cellOffset[cell][2];
      shAtom.x += cSim.cellOffset[cell][0];
      shAtom.y += cSim.cellOffset[cell][1];
      shAtom.z += cSim.cellOffset[cell][2];
  #ifdef PME_VIRIAL
      shAtom.x = sUcellf[0]*shAtom.x + sUcellf[1]*shAtom.y +
      shAtom.y = sUcellf[4]*shAtom.y + sUcellf[5]*shAtom.z;
      shAtom.z = sUcellf[8]*shAtom.z;
      shAtom.x = cSim.ucellf[0][0]*shAtom.x + cSim.ucellf[0][1]*shAtom.y +
      shAtom.y = cSim.ucellf[1][1]*shAtom.y + cSim.ucellf[1][2]*shAtom.z;
      shAtom.z = cSim.ucellf[2][2]*shAtom.z;

There's a lot of pre-processor stuff there to make different kernels when
there is or is not a virial, when the box is orthogonal or not. Beyond
that, this only appears once we get past the "home tiles" when atoms from
the same hash cell are interacting (that's a nearly identical block of code
on lines 210-550). Atoms in the home tile are already within one (cutoff +
pair list margin) of each other and so do not have to be further imaged.
But atoms from separate hash cells do need to be re-imaged. Look at the
comment: that was written by the original author, and while I have
overhauled the non-bonded loop quite a bit over the past eight months that
aspect of it has remained.

Imagine what would happen if you've only got two hash cells in any one box
dimension. When atoms from two hash cells along that dimension interact
with one another, there will be one collection of them from each hash cell,
and all atoms will be imaged with respect to the position of the first atom
in the first collection. Now imagine that the collections each roughly
span their hash cell--the Hilbert space filling curve tries to avoid this
sort of thing but it may happen that the collection of atoms selected by
the snakes around a bit. The idea is that, between them, the atoms in both
collections span the simulation box making a chain of sorts from one side
to the other, but they're all getting imaged with respect to just one
atom. The imaged atoms on the far ends of the chain will register in the
non-bonded loop as being well outside the cutoff and therefore
non-interacting, but in fact they're really close to one another as could
be seen if the imaging were done differently, or done for every dx, dy, dz
the code computes (this of course would go against the very spirit of the
comment--image once to avoid handling Periodic Boundary Conditions in the
inner loops). The PBCs are handled once for every 32 interactions and so
hardly make a dent in the performance--it's probably cheaper to do it that
way than the ghost atoms I'm preparing in my new formulation. But, if
you're only two hash cells thick, my feeling is that you can't image just
once and forget about it.

There may be other aspects of Scott's pair list that I haven't yet taken
into account, and I see that he makes a note if there is a "SMALL_BOX"
condition but I haven't yet understood what differences that leads to.
It's therefore plausible to me that I am not fully understanding his method
and that he does have a mechanism in there to accommodate this, but
ultimately the non-bonded forces need to be computed by kNLCPNE.h and I
don't see SMALL_BOX doing anything at that level. Hence, I'm worried, but
not entirely sure yet that there is a problem.

You are from Los Alamos. This is where my parents live.


On Tue, Jan 16, 2018 at 12:24 AM, Neale, Christopher Andrew <
> wrote:

> Dear David:
> the last part of your email is terrifying... I'm on the edge of my seat to
> know if "a bug or not" might be affecting all small simulation boxes, even
> rectilinear ones (and how small you consider "small"). I routinely check
> single-point energies for the same system in amber vs gromacs vs charmm,
> etc., to ensure accuracy, but missing neighborlist inclusions will not be
> caught by such tests and goes to the very heart of a simulation software.
> Given that most of us use rectilinear boxes, can you please describe the
> "suspicious" part of pmemd?
> Thank you,
> Chris.
> ________________________________
> From: David Cerutti <>
> Sent: Monday, January 15, 2018 8:11:15 PM
> To: AMBER Developers Mailing List
> Subject: [AMBER-Developers] Please read: we need to look at pmemd.cuda
> pair list fidelity
> OK peeps,
> *TL;DR: if you have run any simulations using an octahedral box, please
> help me out by running them again in the NVE ensemble using Amber16 as well
> as the new SpeedBoostSM code. If you have any simulations in octahedral
> boxes that did pressure equilibration in pmemd.cuda, run those again
> starting from the low-density initial configuration with the new code in
> SpeedBoostSM and see whether the code still allows you to complete the
> run. If you have any simulations of small or thin boxes (only slightly
> more than twice the cutoff in thickness, need not be octahedral but only
> with pmemd.cuda) and ever saw anything suspicious with them, write to me
> and tell me about it.*
> Bad news. It's looking pretty certain that there is a bug in Amber that we
> will have to deal with before the next release. It's not hard to patch,
> but the bug goes deeper than pmemd.cuda and therefore will break some test
> cases. What's happening is that, for non-orthogonal unit cells (and the
> only case the tleap really knows how to construct is the truncated
> octahedron, translating it into a parallelepiped with all angles at
> 109.47), sander and pmemd do not know how to properly gauge the distance
> between the box faces, and therefore the number of hash cells to create
> when building their pair lists. The problem is worse in pmemd.cuda,
> because the way the code works means that it is faster to rebuild the pair
> list as little as possible and the code not only miscalculates the number
> of cells it should allocate but the width of each cell and therefore the
> pair list margin that it has to work with. When the unit cell is not a
> cube or a shoebox, all of the codes can create pair lists that expire well
> before they expect, pmemd.cuda particularly so.
> I have a version of the code that gets the plane separation right and
> correctly calculates how much room the code has to work with, currently
> committed in SpeedBoostSM *(commit
> 86a86e6e07f36f1d0de9b361ee801fefc5dfa46e)*. I would like to make a
> separate version of the master branch that has only the changes needed for
> the patch so we can test it that way, but I am having some trouble
> compiling things from the master branch:
> make[2]: Entering directory `/home/cerutti/amberPBC/AmberTools/src/sff'
> ...
> prm.c: In function ‘rdparm’:
> prm.c:769: error: ‘for’ loop initial declarations are only allowed in C99
> mode
> prm.c:769: note: use option -std=c99 or -std=gnu99 to compile your code
> prm.c:886: error: ‘for’ loop initial declarations are only allowed in C99
> mode
> prm.c:1457: error: ‘for’ loop initial declarations are only allowed in C99
> mode
> prm.c:1472: error: ‘for’ loop initial declarations are only allowed in C99
> mode
> make[2]: *** [prm.o] Error 1
> is what I get on one platform, and on another I get complaints about
> typecasting in certain integers with the RISM code. We can deal with these
> problems in time (Dave Case seems to have just noticed them from the
> repository threads), but for now I think it's fine to have people download
> the SpeedBoostSM branch and see whether their results change. The new
> version WILL fail 15 of the test cases--this is a consequence of correcting
> the pair list layout.
> We need to do some fairly broad and lengthy testing of this problem to see
> how widespread the damage is. I have some specific tests in mind, but for
> now anyone who did lengthy simulations with octahedral cells is asked to
> please help out by firing them off a second time with separate executables
> compiled based on code in SpeedBoostSM. Run 'em a long time if possible.
> Probably what we should be looking at are long-timescale NVE comparisons,
> because the danger is that the pair list is incomplete for some or all
> steps of the simulation. There are specific tests that I have in mind to
> quantify how severe the problem is. I don't think that many simulations
> are truly problematic, but proving this will take some additional coding
> and carefully constructed experiments.
> The fix will have to come in stages: the fix for pmemd.cuda is in hand,
> because I know that code the best of the three. For pmemd Fortran code and
> sander it will take perhaps an afternoon to track down the calculations and
> patch them.
> There is one other issue on my radar--I'm not sure if it is a bug or not,
> but it is specific to pmemd.cuda and looks suspicious to me. It could
> affect any small simulation boxes (shoeboxes or octahedral). This again
> pertains to pair list counting but it is a separate issue which I will be
> looking into once the cell dimensions bug is under control.
> Dave
> _______________________________________________
> AMBER-Developers mailing list
> _______________________________________________
> AMBER-Developers mailing list
AMBER-Developers mailing list
Received on Mon Jan 15 2018 - 22:30:02 PST
Custom Search