This looks like a bug of some sort to me:
/projects2/joao.ribeiro/amber_lbsr/amber_lbsr/amber20/test/cuda/chamber/dhfr
66c66
<  Etot   =     -3050.6670  EKtot   =      2230.2661  EPtot      =
-5280.9331
>  Etot   =    -10504.5648  EKtot   =      2230.2661  EPtot      =    -12734.8309
70c70
<  EELEC  =    -10036.4148  EGB     =     -2483.6659  RESTRAINT  =         0.
>  EELEC  =    -10036.4148  EGB     =     -9937.5636  RESTRAINT  =         0.
That is not roundoff error...
On Fri, Jul 9, 2021 at 11:31 AM Charles Lin <charles.lin.roivant.com> wrote:
> Hi Dave,
>
> I attached a diff and log file.  From what I can tell almost everything
> non-remd fails.  We’ve tried multiple different MPI builds (including
> building one against the system), and I’m testing these in DPFP.
>
> -Charlie
>
> From: David Cerutti <dscerutti.gmail.com>
> Reply-To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> Date: Friday, July 9, 2021 at 11:30 AM
> To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> Subject: Re: [AMBER-Developers] A40 pmemd CUDA MPI
>
> This smells like a random numbers thing. I may have some time in the
> coming week to look into it, but I sure don't have an A40 in my hands yet.
> Are the issues spread throughout NVE, NPT, NTT tests, GB as well as PME
> setups? From your mail it looks like some (but not all) of the non-REMD
> PME tests are failing, and the non-REMD GB tests are failing in the kinetic
> energies from step 1 onward. Do any PME non-REMD tests pass? Are you
> running the tests in DPFP or SPFP mode?
>
> Dave
>
>
> On Fri, Jul 9, 2021 at 11:17 AM Charles Lin <charles.lin.roivant.com>
> wrote:
>
> > Hi all,
> >
> > I was wondering if anyone has tried running CUDA MPI on the NVIDIA A40
> > cards. I’m currently using CUDA 11.0, and using AMD cpus. I’ve gotten the
> > following to pass:
> > pmemd
> > pmemd.MPI
> > pmemd.cuda
> >
> > It seems all REMD passes for pmemd.cuda.MPI, but for non-REMD jobs the
> > tests fail. The issue seems to stem from the kinetic energies for some
> > tests and the EGB+Kinetic Energies for GB tests (all other energy terms
> > including potential energy look fine in step 1). The velocities are
> coming
> > out different so I’m wondering if its an MPI issue in the CUDA code (?),
> > but I’m not well-versed in that part of the code, so was wondering if
> > someone could investigate that.
> >
> > Thanks!
> > Charlie
> > _______________________________________________
> > AMBER-Developers mailing list
> > AMBER-Developers.ambermd.org
> > http://lists.ambermd.org/mailman/listinfo/amber-developers<
> http://lists.ambermd.org/mailman/listinfo/amber-developers>
> >
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers<
> http://lists.ambermd.org/mailman/listinfo/amber-developers>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Jul 10 2021 - 17:00:02 PDT