[AMBER-Developers] Occasional failures in pmemd serial

From: <dcerutti.rci.rutgers.edu>
Date: Thu, 14 Mar 2013 18:17:01 -0400 (EDT)


Following the developers' meeting and the emergence of the "domdec" branch
of pmemd, I'm looking at the state of the CPU code and trying to clear up
some things as we work to make that one competitive against its
counterparts in Gromacs and now CHARMM. In order to ensure that what I do
is sane, I'm running the test suite on pmemd and I notice, out of dozens
of tests, a few FAILURE warnings.

The first that appears is

cd 4096wat && ./Run.pure_wat_nmr_temp_reg

The results are not just different in the last decimal place, they are
significantly different over all steps (mdout.pure_wat_nmr_temp.dif has a
lot of items in it). Then comes

cd trx && ./Run.trx.cpln.pmemd

which also contains lots of items with significantly different numbers.
In both of these cases, it seems that there is a Langevin thermostat in
place and the differences start at step 2, so the random number sequence
may somehow be different (did Monte Carlo barostat initialization perhaps
set this off, even if the barostat is not ultimately used?). I see other
cases among the gb_trx tests where a Langevin thermostat works fine, but
maybe these GB cases do not invoke any barostat initialization.

In any case, I just want to make sure that I have a strong reference
before I start hacking.

First order of business will be to special-case the scalar_sum and
scalar_sumrc functions to compute pot_enes and virials only when
requested, in line with the special-casing that happens in the direct
space sum. It's a small cost in the overall run time, but it's chiseling
away at something that a few of the processors end up spending a
significant amount of their time on, so it's one of those things that
should help us in the long run.


AMBER-Developers mailing list
Received on Thu Mar 14 2013 - 15:30:02 PDT
Custom Search