Re: [AMBER-Developers] NetCDF & Restart Files from Daniel Roe on 2017-05-18 (Amber Developers Archive May 2017)

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Thu, 18 May 2017 16:22:05 -0400

I assume these results are with pmemd, correct? I can't reproduce this
behavior with sander but I can with pmemd.

I've done some quick testing and it seems like the problem is the axes
are not properly flipped in some cases:

$ head ASCII.rst7

408609 0.2003000E+02
  26.3227445 95.9962558 72.5120593 26.9114296 95.5496312 71.8992165

head converted.rst7
Cpptraj Generated Restart
408609 2.0030000E+01
  95.9962558 72.5120593 26.3227445 95.5496312 71.8992165 26.9114296

Note the ASCII order (x, y, z) is not the NetCDF order (y, z, x). I
checked the code and indeed, the axes flip is accounted for in a
buffer, but instead of the buffer being written out the original
coords are. This fixes it:

diff --git a/src/pmemd/src/binrestart.F90 b/src/pmemd/src/binrestart.F90
index 60cea87..15d907e 100644
--- a/src/pmemd/src/binrestart.F90
+++ b/src/pmemd/src/binrestart.F90
.. -382,7 +382,7 .. subroutine
write_nc_coords(ncid,VID,natom,arrayIn,ord1,ord2,ord3)
       buf(3, i) = arrayIn(ord3, i)
     end do

- call checkNCerror(nf90_put_var(ncid, VID, arrayIn(:,:), &
+ call checkNCerror(nf90_put_var(ncid, VID, buf(:,:), &
                                    start=(/ 1, 1 /), &
                                    count=(/ 3, natom /)), &
                       'NetCDF write flipped coords')

I've pushed the fix to master. I have to run and get my daughter at
day care right now so if someone wants to put this into a bugfix it
should probably happen soon (after some more tests). If no one
volunteers I'll try to get to it tonight.

Good catch Charlie.

-Dan

On Thu, May 18, 2017 at 1:22 PM, Charles Lin <clin92.ucsd.edu> wrote:
> Hi everyone,
>
> I was testing some stuff with restarts using Cellulose NVE from the Amber14 Benchmark Suite. Code I ran was using the amber16-with-patches branch.
>
> My runs were as followed:
> 1.) 100 steps from the original input file (ASCII)
> 2.) 100 steps using a restart file made from step 1 (Netcdf)
> 3.) 200 steps from the original input file (ASCII)
> 4.) 100 steps from the original input file (ASCII)
> 5.) 100 steps using a restart file made from step 4 (ASCII)
>
> So essentially there are 3- 200 step runs. I ran these on the following compilers on a broadwell (xeon) machine using (gnu + openmpi (4 mpis), intel + openmpi (4 mpis), intel + intelmpi (4 mpis), gnu, intel).
> Intel version: 2016.0.1 (I saw issues when I was running on a skylake running 2017 compilers)
> GNU version: 4.8.5
>
> TL;DR: Anything that used a netcdf restart that used mpi (openmpi or intelmpi) would see vmax issues starting at step 0 so energies instantly explode when using a netcdf restart, but would behave properly using an ascii input. All the serial cases (intel & gnu) behaved properly. I didn't test cuda (CPU already took forever to test), but the output code path uses the cpu code so I'd assume we'd see the same issues.
>
> Any ideas? I'd assume this should be an issue because ntxo is defaulted to 2 (writing binary restarts)
>
> Charlie
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers

-- 
-------------------------
Daniel R. Roe
Laboratory of Computational Biology
National Institutes of Health, NHLBI
5635 Fishers Ln, Rm T900
Rockville MD, 20852
https://www.lobos.nih.gov/lcb
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Thu May 18 2017 - 13:30:03 PDT