Re: [AMBER-Developers] Deviated energy outputs from MPI code from Yinglong Miao on 2017-12-01 (Amber Developers Archive Dec 2017)

From: Yinglong Miao <yinglong.miao.gmail.com>
Date: Fri, 1 Dec 2017 15:58:22 -0600

Thanks for your suggestion/insights, Gerald and Jason.

I may still like to use the 2fs tilmestep, especially for long production simulations ;-). For the CPU code, It’s true that many people may have used the MPI version more in their application studies. While for the GPU code, I have actually used the serial version more as it’s fast enough for lots of systems. Besides, from the programming point of view, I may trust the serial version more as it avoids message passing and possible hardware problems as often occurred in the MPI code. So I guess we usually make sure the serial code works fine and use it as reference to develop the MPI code …

Yinglong

> On Dec 1, 2017, at 2:34 PM, Jason Swails <jason.swails.gmail.com> wrote:
>
> On Fri, Dec 1, 2017 at 12:26 PM, Yinglong Miao <yinglong.miao.gmail.com <mailto:yinglong.miao.gmail.com>>
> wrote:
>
>> Hi All,
>>
>> As I tried MPI code of the latest AMBER for both the CPU and GPU versions,
>> energies output from the MPI code start to deviate from those of serial
>> code after hundreds of MD steps. The energies are so different that they
>> lead to different temperatures (and probably trajectories) of system:
>>
>> diff md-1-cpu.out md-1-cpu-mpi.out | grep "NSTEP " | more
>> < NSTEP = 583 TIME(PS) = 41.166 TEMP(K) = 306.91 PRESS
>> = 0.0
>>> NSTEP = 583 TIME(PS) = 41.166 TEMP(K) = 306.90 PRESS
>> = 0.0
>> ...
>> < NSTEP = 900 TIME(PS) = 41.800 TEMP(K) = 302.37 PRESS
>> = 0.0
>>> NSTEP = 900 TIME(PS) = 41.800 TEMP(K) = 300.97 PRESS
>> = 0.0
>>
>> The system I tested was the very small alanine dipeptide (input files are
>> attached). I would expect the deviation may happen sooner for bigger
>> systems like proteins. Yes, there could be accumulated rounding errors, but
>> how do we evaluate whether our MPI simulations are still accurate, with no
>> errors in the code?
>>
>
> Compare ensemble properties of a model system that is fast and easy to
> converge to a known value. But what leads you to trust the MPI
> implementation less than the serial implementation? The former has been
> *much* more extensively used (and therefore tested) in production
> simulations. Most people rely (implicitly) on this body of work to claim
> the validity of the implementation of a force field that they use.
>
> This is an issue with literally every package out there (how do you trust
> *any* implementation of any force field)? pmemd.MPI makes no claim to be
> deterministic (and due to the dynamic load balancer, usually is not).
>
> HTH,
> Jason
>
> --
> Jason M. Swails
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org <mailto:AMBER-Developers.ambermd.org>
> http://lists.ambermd.org/mailman/listinfo/amber-developers <http://lists.ambermd.org/mailman/listinfo/amber-developers>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Dec 01 2017 - 14:00:02 PST