On Fri, Dec 1, 2017 at 12:26 PM, Yinglong Miao <yinglong.miao.gmail.com>
> Hi All,
> As I tried MPI code of the latest AMBER for both the CPU and GPU versions,
> energies output from the MPI code start to deviate from those of serial
> code after hundreds of MD steps. The energies are so different that they
> lead to different temperatures (and probably trajectories) of system:
> diff md-1-cpu.out md-1-cpu-mpi.out | grep "NSTEP " | more
> < NSTEP = 583 TIME(PS) = 41.166 TEMP(K) = 306.91 PRESS
> = 0.0
> > NSTEP = 583 TIME(PS) = 41.166 TEMP(K) = 306.90 PRESS
> = 0.0
> ...
> < NSTEP = 900 TIME(PS) = 41.800 TEMP(K) = 302.37 PRESS
> = 0.0
> > NSTEP = 900 TIME(PS) = 41.800 TEMP(K) = 300.97 PRESS
> = 0.0
> The system I tested was the very small alanine dipeptide (input files are
> attached). I would expect the deviation may happen sooner for bigger
> systems like proteins. Yes, there could be accumulated rounding errors, but
> how do we evaluate whether our MPI simulations are still accurate, with no
> errors in the code?
​Compare ensemble properties of a model system that is fast and easy to
converge to a known value. But what leads you to trust the MPI
implementation less than the serial implementation? The former has been
*much* more extensively used (and therefore tested) in production
simulations. Most people rely (implicitly) on this body of work to claim
the validity of the implementation of a force field that they use.
This is an issue with literally every package out there (how do you trust
*any* implementation of any force field)? pmemd.MPI makes no claim to be
deterministic (and due to the dynamic load balancer, usually is not).
Jason M. Swails
AMBER-Developers mailing list
Received on Fri Dec 01 2017 - 13:00:02 PST