Re: [AMBER-Developers] Random crashes AMBER 18 on GPUs

From: Gerald Monard <Gerald.Monard.univ-lorraine.fr>
Date: Fri, 15 Jun 2018 09:07:55 +0200

Hello,

On P100, amber18 with gcc-5.4.0 and cuda-8.0, same behavior:
3.0: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.1: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.2: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.3: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.4: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.5: 3.6: Etot = -2707313.8447 EKtot = 663835.5000 EPtot
= -3371149.3447
3.7: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.8: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447
3.9: Etot = -2707313.8447 EKtot = 663835.5000 EPtot =
-3371149.3447

cudaMemcpy GpuBuffer::Download failed an illegal memory access was
encountered

Gerald.


On 06/15/2018 04:01 AM, Ke Li wrote:
> To confirm that same issue, w/ w/o P2P are observed on Tesla V100 CUDA9.2.88+R396.26
>
> The different Etot = -2707218.6220 and Etot = -2709883.4871 are expected because P2P and non-P2P could generate different reductions.
>
> -----Original Message-----
> From: David A Case <david.case.rutgers.edu>
> Sent: Thursday, June 14, 2018 5:43 PM
> To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> Subject: Re: [AMBER-Developers] Random crashes AMBER 18 on GPUs
>
> On Thu, Jun 14, 2018, Ross Walker wrote:
>>
>> I keep seeing failures with AMBER 18 when running GPU validation
>> tests.
>
> Ross: I'm not used to looking at these sorts of logs. Can you summarize a bit:
>
> 1. Does the problem ever happen in serial runs, on only in parallel?
>
> 2. Are you getting "just" crashes (illegal memory access/failed sync.
> etc), or do you get jobs that appear to finish OK but give the wrong result? That is, are jobs that report Etot = -2707218.6220 really supposed to be the same as the ones that report Etot = -2709883.4871?
>
> ...thx...dac
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>

-- 
____________________________________________________________________________
  Prof. Gerald MONARD
  Directeur du mésocentre EXPLOR
  Université de Lorraine
  Boulevard des Aiguillettes B.P. 70239
  F-54506 Vandoeuvre-les-Nancy, FRANCE
  e-mail : Gerald.Monard.univ-lorraine.fr
  phone  : +33 (0)372.745.279
  mobile : +33 (0)678.006.443
  web    : http://www.monard.info
____________________________________________________________________________
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Jun 15 2018 - 00:30:02 PDT
Custom Search