To confirm that same issue, w/ w/o P2P are observed on Tesla V100 CUDA9.2.88+R396.26
The different Etot = -2707218.6220 and Etot = -2709883.4871 are expected because P2P and non-P2P could generate different reductions.
-----Original Message-----
From: David A Case <david.case.rutgers.edu>
Sent: Thursday, June 14, 2018 5:43 PM
To: AMBER Developers Mailing List <amber-developers.ambermd.org>
Subject: Re: [AMBER-Developers] Random crashes AMBER 18 on GPUs
On Thu, Jun 14, 2018, Ross Walker wrote:
>
> I keep seeing failures with AMBER 18 when running GPU validation
> tests.
Ross: I'm not used to looking at these sorts of logs. Can you summarize a bit:
1. Does the problem ever happen in serial runs, on only in parallel?
2. Are you getting "just" crashes (illegal memory access/failed sync.
etc), or do you get jobs that appear to finish OK but give the wrong result? That is, are jobs that report Etot = -2707218.6220 really supposed to be the same as the ones that report Etot = -2709883.4871?
...thx...dac
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Jun 14 2018 - 19:30:03 PDT