RE: amber-developers: ene and ener array mess in sander from Ross Walker on 2008-11-22 (Amber Developers Archive Nov 2008)

From: Ross Walker <ross.rosswalker.co.uk>
Date: Sat, 22 Nov 2008 09:57:04 -0800

Hi Bob,

I agree entirely... I will take a look at the PMEMD code and see if I can
figure it out. Your code is so much easier to follow than sander :-). I
often think about taking a fork of the PMEMD code and replacing sander with
it and then trying to add the sander functionality back in. But at this
point I tend to wake up in cold sweat ;-).

As for the CHARMM stuff in PMEMD indeed we should definitely add it. Once
Mark, Mike and I get it working properly in sander I'll show you all the
changes we have and then hopefully it will be quite easy for you to add this
to PMEMD. We are keeping most of the code in a specific CHARMM module so it
should be fairly simple to plug it in and then optimize it afterwards.

Will you be coming to the AMBER developers meeting? This would be a great
time to work on something like this. We also need to add CMAP support since
in my experience the CHARMM force field without CMAP is about as good as
FF94 at best.

All the best
Ross

> -----Original Message-----
> From: owner-amber-developers.scripps.edu [mailto:owner-amber-
> developers.scripps.edu] On Behalf Of Robert Duke
> Sent: Saturday, November 22, 2008 6:47 AM
> To: amber-developers.scripps.edu
> Subject: Re: amber-developers: ene and ener array mess in sander
>
> HI Ross,
> I think a lot of this has to do with the layering of pme on top of an
> already-existing energy array structure, and it is indeed a mess. Tom D.
> would have to comment, but I think he attempted to get some isolation from
> the rest of the sander code with the second "virtual" ene array at an
> offset
> into ener(). I dredged through this all so long ago that I don't
> recollect
> the sander specifics, but I DID clean up the logical structure for all
> this
> stuff somewhat in pmemd. By looking at that code, starting in runmd and
> sinking into pme_force (vs. gb_force), you may be able to see how I sorted
> some of this stuff out, and that may give you some hints, or possibly
> confuse you further. I also created "state information" structures to
> contain a lot of this junk, specific to the electrostatics implementation
> (pme vs. gb). Thus, I don't deal with arrays with only pieces of the data
> relevant for a particular run. It could be that I have abstracted things
> so
> far away from the original sander implementation that it won't be that
> much
> help in the current pmemd code; you may instead want to look at amber 8
> pmemd, where I still have the original arrays, and where a lot of the
> sander
> accretions are stripped away, leaving the core of what is necessary for
> pme
> (and I added some comments as I figured things out). The most relevant
> files in pmemd 8 are runmd_cit.f90 and force_cit.f90. Some of the array
> copies of course deal with energy avgs, but I presume there may be some
> stuff there from TI, possibly other stuff that is specific to added
> functionality. When/if time comes to add charmm stuff to pmemd, I would
> like to do it. I probably will also reimplement the scee/scnb stuff you
> added earlier to improve cache utilization (most noticeable problem I have
> seen, but I have not studied it a bunch - basically stuff is getting
> knocked
> out of the cache by some of this stuff though). Sander needs a total
> rewrite; of course noone has the time or money, so we will just all pay
> the
> price in lost productivity a little bit at a time, every day.
> Regards - Bob
>
> ----- Original Message -----
> From: "Ross Walker" <ross.rosswalker.co.uk>
> To: <amber-developers.scripps.edu>
> Sent: Saturday, November 22, 2008 1:06 AM
> Subject: amber-developers: ene and ener array mess in sander
>
>
> > Hi All,
> >
> > Does anyone actually understand how the crazy ene / ener mess in sander
> > actually works?
> >
> > I am trying to add support for charmm impropers and urey-bradley terms
> to
> > this array but it seems to be a complete and utter disaster.
> >
> > Lets start in sander.f:
> >
> > _REAL_ ener(30)
> > _REAL_ ene(51)
> >
> > call force(x,ix,ih,ipairs,x(lcrd),x(lforce),ener,vir, &
> > x(l96), x(l97), x(l98), x(l99), qsetup, &
> > do_list_update,n_force_calls)
> >
> > So it passes ener (of size 30 into force).
> >
> > It also passes ene to a ton of other routines all of which have ene
> either
> > hard coded as 51 or assumed size.
> >
> > So in runmd we have a series of arrays all of fixed size 51:
> >
> > _REAL_ enert(51),enert2(51),ener(51)
> > _REAL_ enert_old(51),enert2_old(51),ecopy(51),edvdl(51), &
> > enert_tmp(51),enert2_tmp(51),etot_start,edvdl_r(51)
> >
> > Not a comment in site for what the hell any of these are for or what
> they
> > contain.
> >
> > So then we have the call to force inside runmd:
> >
> > call force(xx,ix,ih,ipairs,cartpos,f,ener(23),vir, &
> > xx(l96),xx(l97),xx(l98),xx(l99), qsetup, &
> > do_list_update,nstep)
> >
> > So here it passes in ener with an OFFSET of 23!!! <sigh...!> - which as
> it
> > is defined as a fixed size of 51 means it is of size 29 inside force -
> > even
> > though the ener passed to force from sander was of size 30!!!
> >
> > So then inside force we have:
> >
> > _REAL_ ener(*) ! offsets in this ener array = offsets in runmd ener -
> 22
> > _REAL_ ene(30)
> >
> > So ener(*) is actually of size 29 inside force since it was passed in
> with
> > an offset of 23 and is size 51 outside (actually ener in force is of
> size
> > 30
> > when called directly from sander since it was passed sander's ene here).
> >
> > Tracing ener through force.f we see that typically it is used up to
> value
> > 28. However on line 1165 it is passed to cns_xref_run:
> >
> > call cns_xref_run(natom,ih(m04), x,f,ener)
> >
> > Now looking inside cns_xref_run we have:
> >
> > subroutine cns_xref_run(natom,ih,x,fg,ener)
> > _REAL_, intent(inout) :: ener(*)
> > ...
> > call rdxraycns(natcns,medcns,ener,fg)
> > ...
> > ener(16) = ener(47)
> > ener(18) = ener(49)
> > ener(19) = ener(50)
> >
> > So cns_xref_run accesses elements 47, 49 and 50 of ener even though ener
> > is
> > only of size 1:29 in this scope! DOH!.
> >
> > Then it gets EVEN WORSE:
> >
> > inside rdxraycns:
> >
> > subroutine rdxraycns(natcns,medcns,ener,fg)
> >
> > implicit none
> > _REAL_ ener(*),fg(*),xgrd1,xgrd2,xgrd3
> > ...
> > read(ipdb,'(E20.14)') ener(47)
> > read(ipdb,'(A80)') line
> > read(ipdb,'(E20.14)') ener(48)
> > read(ipdb,'(A80)') line
> > read(ipdb,'(F8.6,1X,F8.6)') ener(50),ener(49)
> >
> > So this actually writes to elements ener(50), 49 etc etc. This is
> > completely
> > illegal as far as I can tell and is just corrupting memory somewhere - I
> > am
> > shocked it doesn't segfault.
> >
> > And this is JUST TOUCHING THE SURFACE of this ene / ener mess!
> >
> > Then in evb_ntrfc we have ener passed in from force and lines like:
> >
> > ener(1:26) = 0.0d0
> >
> > Any reason why it is zeroing just elements 1:26 of ener here and not the
> > full 1:29 (or 1:30 if force was called from sander)?
> >
> > Then on line 1206 of force we have:
> >
> > if (master) call
> mpi_reduce(ener,totener(23),28,MPI_DOUBLE_PRECISION,
> > &
> > MPI_SUM,0,commmaster,ierr)
> >
> > So only elements 1 to 28 of ener here are reduced.
> >
> > Then inside fdist (called from force with ene) we have:
> >
> > do i = 2,30
> > f(j+1+i) = ene(i)
> > end do
> >
> > so ene is reduced from 2 to 30 even though as far as I can tell it is
> only
> > used up to element 28.
> >
> > and then we have horrendous statements like:
> >
> > call
> >
> mpi_allreduce(MPI_IN_PLACE,f(j),33,MPI_DOUBLE_PRECISION,mpi_sum,commsander
> ,i
> > err)
> > vir(1) = f(j)
> > vir(2) = f(j+1)
> > vir(3) = f(j+2)
> > do i = 2,30
> > ene(i) = f(j+i+1)
> > end do
> >
> > where the 33 is hard coded - change the size of ene and you get screwed
> > but
> > you would be hard pressed to find this given it is hard coded as 33
> > (rather
> > than the 30 for ene) since it also contains the virials etc...
> >
> > So, having spent 4 hours this evening trying to extend these arrays to
> > include the charmm urey-bradley and improper terms and finding that if I
> > change anything I end breaking things either in parallel, serial or both
> I
> > am hoping that someone here has a decent understanding of how these ene
> /
> > ener arrays are actually supposed to work.
> >
> > And the memory errors in cns_xref etc (and possibly others elsewhere
> that
> > I
> > haven't found) should be fixed.
> >
> > All the best
> > Ross
> >
> > /\
> > \/
> > |\oss Walker
> >
> > | Assistant Research Professor |
> > | San Diego Supercomputer Center |
> > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> > | http://www.rosswalker.co.uk | PGP Key available on request |
> >
> > Note: Electronic Mail is not secure, has no guarantee of delivery, may
> not
> > be read every day, and should not be used for urgent or sensitive
> issues.
> >
> >
> >
> >
Received on Fri Dec 05 2008 - 16:23:18 PST