amber-developers: ene and ener array mess in sander from Ross Walker on 2008-11-21 (Amber Developers Archive Nov 2008)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Fri, 21 Nov 2008 22:06:28 -0800

Hi All,

Does anyone actually understand how the crazy ene / ener mess in sander
actually works?

I am trying to add support for charmm impropers and urey-bradley terms to
this array but it seems to be a complete and utter disaster.

Lets start in sander.f:

_REAL_ ener(30)
_REAL_ ene(51)

call force(x,ix,ih,ipairs,x(lcrd),x(lforce),ener,vir, &
               x(l96), x(l97), x(l98), x(l99), qsetup, &
               do_list_update,n_force_calls)

So it passes ener (of size 30 into force).

It also passes ene to a ton of other routines all of which have ene either
hard coded as 51 or assumed size.

So in runmd we have a series of arrays all of fixed size 51:

   _REAL_ enert(51),enert2(51),ener(51)
   _REAL_ enert_old(51),enert2_old(51),ecopy(51),edvdl(51), &
         enert_tmp(51),enert2_tmp(51),etot_start,edvdl_r(51)

Not a comment in site for what the hell any of these are for or what they
contain.

So then we have the call to force inside runmd:

         call force(xx,ix,ih,ipairs,cartpos,f,ener(23),vir, &
                  xx(l96),xx(l97),xx(l98),xx(l99), qsetup, &
                  do_list_update,nstep)

So here it passes in ener with an OFFSET of 23!!! <sigh...!> - which as it
is defined as a fixed size of 51 means it is of size 29 inside force - even
though the ener passed to force from sander was of size 30!!!

So then inside force we have:

_REAL_ ener(*) ! offsets in this ener array = offsets in runmd ener - 22
_REAL_ ene(30)

So ener(*) is actually of size 29 inside force since it was passed in with
an offset of 23 and is size 51 outside (actually ener in force is of size 30
when called directly from sander since it was passed sander's ene here).

Tracing ener through force.f we see that typically it is used up to value
28. However on line 1165 it is passed to cns_xref_run:

call cns_xref_run(natom,ih(m04), x,f,ener)

Now looking inside cns_xref_run we have:

  subroutine cns_xref_run(natom,ih,x,fg,ener)
    _REAL_, intent(inout) :: ener(*)
   ...
   call rdxraycns(natcns,medcns,ener,fg)
   ...
   ener(16) = ener(47)
   ener(18) = ener(49)
   ener(19) = ener(50)

So cns_xref_run accesses elements 47, 49 and 50 of ener even though ener is
only of size 1:29 in this scope! DOH!.

Then it gets EVEN WORSE:

inside rdxraycns:

   subroutine rdxraycns(natcns,medcns,ener,fg)

      implicit none
      _REAL_ ener(*),fg(*),xgrd1,xgrd2,xgrd3
      ...
      read(ipdb,'(E20.14)') ener(47)
      read(ipdb,'(A80)') line
      read(ipdb,'(E20.14)') ener(48)
      read(ipdb,'(A80)') line
      read(ipdb,'(F8.6,1X,F8.6)') ener(50),ener(49)

So this actually writes to elements ener(50), 49 etc etc. This is completely
illegal as far as I can tell and is just corrupting memory somewhere - I am
shocked it doesn't segfault.

And this is JUST TOUCHING THE SURFACE of this ene / ener mess!

Then in evb_ntrfc we have ener passed in from force and lines like:

ener(1:26) = 0.0d0

Any reason why it is zeroing just elements 1:26 of ener here and not the
full 1:29 (or 1:30 if force was called from sander)?

Then on line 1206 of force we have:

      if (master) call mpi_reduce(ener,totener(23),28,MPI_DOUBLE_PRECISION,
&
                      MPI_SUM,0,commmaster,ierr)

So only elements 1 to 28 of ener here are reduced.

Then inside fdist (called from force with ene) we have:

   do i = 2,30
      f(j+1+i) = ene(i)
   end do

so ene is reduced from 2 to 30 even though as far as I can tell it is only
used up to element 28.

and then we have horrendous statements like:

      call
mpi_allreduce(MPI_IN_PLACE,f(j),33,MPI_DOUBLE_PRECISION,mpi_sum,commsander,i
err)
      vir(1) = f(j)
      vir(2) = f(j+1)
      vir(3) = f(j+2)
      do i = 2,30
         ene(i) = f(j+i+1)
      end do

where the 33 is hard coded - change the size of ene and you get screwed but
you would be hard pressed to find this given it is hard coded as 33 (rather
than the 30 for ene) since it also contains the virials etc...

So, having spent 4 hours this evening trying to extend these arrays to
include the charmm urey-bradley and improper terms and finding that if I
change anything I end breaking things either in parallel, serial or both I
am hoping that someone here has a decent understanding of how these ene /
ener arrays are actually supposed to work.

And the memory errors in cns_xref etc (and possibly others elsewhere that I
haven't found) should be fixed.

All the best
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.
Received on Fri Dec 05 2008 - 16:19:45 PST