Re: amber-developers: ene and ener array mess in sander

From: Robert Duke <rduke.email.unc.edu>
Date: Sat, 22 Nov 2008 13:27:03 -0500

Hi Ross -
Yes, my implementation is not perfect on the energy stuff, but it is much
easier to follow. There are no equivalences, mystery array offsets,
unparameterized constants, etc. It took me, I am sure (I tend to repress
such memories) at least several days to get this all straightened out, and I
did it because I couldn't take it any longer. The big hairy problem I have
always seen with sander is that with at least a dozen folks hacking away at
it, it is always going to be difficult to slip in new infrastructure. You
basically have to get everybody else out of the code for blocks of time that
may not be that small, and then let them sync up afterward. I do think my
"state information" abstraction is good - it is what you are really dealing
with - ene's, pressure, temp, vol, etc. I looked back at the code, and what
I really do is parameterize offsets into a "state information array" that is
used in all implementations. Then for each of the electrostatics models
(currently just pme and gb for pmemd), I create a "potential energy record
(or structure)" that clearly names everything associated with the given
electrostatics model. I do the same thing with the virials. What does get
more dicey is you have to marshall some stuff for reduces, etc. in mpi
communications, so there is a buffer copy in and out, but it is really not a
big deal (you can use a 'sequence' keyword in the records to insure that the
offsets are as presented). And I have this big ugly block after each force
call where I pull info out of the potential energy structure into the state
information array - a little kludgy, but very clear what is going on.
Sometimes you have to really think about efficiency vs. clarity. So there
are undoubtedly better ways to do this than what I have done, but it was a
first step toward removing a lot of the mysteries. I would really like to
see runmd functionality better encapsulated into routines, and then have a
pme_runmd() subroutine, a gb_runmd() subroutine, etc. I think runmd is
where absolutely everything seems to run together - not impossible to deal
with, but a mess. Of course, a fair bit of complexity has also crept down
into the sander force routines, as qm/mm, evb, remd, les, you name it has
been layered on top. This whole model of dev is going to eventually
collapse in on itself, especially if nothing is done to combat the increase
in entropy.

I am indeed hoping to come to the dev meeting, barring various catastrophes.

Best regards - Bob

----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: <amber-developers.scripps.edu>
Sent: Saturday, November 22, 2008 12:57 PM
Subject: RE: amber-developers: ene and ener array mess in sander


> Hi Bob,
>
> I agree entirely... I will take a look at the PMEMD code and see if I can
> figure it out. Your code is so much easier to follow than sander :-). I
> often think about taking a fork of the PMEMD code and replacing sander
> with
> it and then trying to add the sander functionality back in. But at this
> point I tend to wake up in cold sweat ;-).
>
> As for the CHARMM stuff in PMEMD indeed we should definitely add it. Once
> Mark, Mike and I get it working properly in sander I'll show you all the
> changes we have and then hopefully it will be quite easy for you to add
> this
> to PMEMD. We are keeping most of the code in a specific CHARMM module so
> it
> should be fairly simple to plug it in and then optimize it afterwards.
>
> Will you be coming to the AMBER developers meeting? This would be a great
> time to work on something like this. We also need to add CMAP support
> since
> in my experience the CHARMM force field without CMAP is about as good as
> FF94 at best.
>
> All the best
> Ross
>
>> -----Original Message-----
>> From: owner-amber-developers.scripps.edu [mailto:owner-amber-
>> developers.scripps.edu] On Behalf Of Robert Duke
>> Sent: Saturday, November 22, 2008 6:47 AM
>> To: amber-developers.scripps.edu
>> Subject: Re: amber-developers: ene and ener array mess in sander
>>
>> HI Ross,
>> I think a lot of this has to do with the layering of pme on top of an
>> already-existing energy array structure, and it is indeed a mess. Tom D.
>> would have to comment, but I think he attempted to get some isolation
>> from
>> the rest of the sander code with the second "virtual" ene array at an
>> offset
>> into ener(). I dredged through this all so long ago that I don't
>> recollect
>> the sander specifics, but I DID clean up the logical structure for all
>> this
>> stuff somewhat in pmemd. By looking at that code, starting in runmd and
>> sinking into pme_force (vs. gb_force), you may be able to see how I
>> sorted
>> some of this stuff out, and that may give you some hints, or possibly
>> confuse you further. I also created "state information" structures to
>> contain a lot of this junk, specific to the electrostatics implementation
>> (pme vs. gb). Thus, I don't deal with arrays with only pieces of the
>> data
>> relevant for a particular run. It could be that I have abstracted things
>> so
>> far away from the original sander implementation that it won't be that
>> much
>> help in the current pmemd code; you may instead want to look at amber 8
>> pmemd, where I still have the original arrays, and where a lot of the
>> sander
>> accretions are stripped away, leaving the core of what is necessary for
>> pme
>> (and I added some comments as I figured things out). The most relevant
>> files in pmemd 8 are runmd_cit.f90 and force_cit.f90. Some of the array
>> copies of course deal with energy avgs, but I presume there may be some
>> stuff there from TI, possibly other stuff that is specific to added
>> functionality. When/if time comes to add charmm stuff to pmemd, I would
>> like to do it. I probably will also reimplement the scee/scnb stuff you
>> added earlier to improve cache utilization (most noticeable problem I
>> have
>> seen, but I have not studied it a bunch - basically stuff is getting
>> knocked
>> out of the cache by some of this stuff though). Sander needs a total
>> rewrite; of course noone has the time or money, so we will just all pay
>> the
>> price in lost productivity a little bit at a time, every day.
>> Regards - Bob
>>
>> ----- Original Message -----
>> From: "Ross Walker" <ross.rosswalker.co.uk>
>> To: <amber-developers.scripps.edu>
>> Sent: Saturday, November 22, 2008 1:06 AM
>> Subject: amber-developers: ene and ener array mess in sander
>>
>>
>> > Hi All,
>> >
>> > Does anyone actually understand how the crazy ene / ener mess in sander
>> > actually works?
>> >
>> > I am trying to add support for charmm impropers and urey-bradley terms
>> to
>> > this array but it seems to be a complete and utter disaster.
>> >
>> > Lets start in sander.f:
>> >
>> > _REAL_ ener(30)
>> > _REAL_ ene(51)
>> >
>> > call force(x,ix,ih,ipairs,x(lcrd),x(lforce),ener,vir, &
>> > x(l96), x(l97), x(l98), x(l99), qsetup, &
>> > do_list_update,n_force_calls)
>> >
>> > So it passes ener (of size 30 into force).
>> >
>> > It also passes ene to a ton of other routines all of which have ene
>> either
>> > hard coded as 51 or assumed size.
>> >
>> > So in runmd we have a series of arrays all of fixed size 51:
>> >
>> > _REAL_ enert(51),enert2(51),ener(51)
>> > _REAL_ enert_old(51),enert2_old(51),ecopy(51),edvdl(51), &
>> > enert_tmp(51),enert2_tmp(51),etot_start,edvdl_r(51)
>> >
>> > Not a comment in site for what the hell any of these are for or what
>> they
>> > contain.
>> >
>> > So then we have the call to force inside runmd:
>> >
>> > call force(xx,ix,ih,ipairs,cartpos,f,ener(23),vir, &
>> > xx(l96),xx(l97),xx(l98),xx(l99), qsetup, &
>> > do_list_update,nstep)
>> >
>> > So here it passes in ener with an OFFSET of 23!!! <sigh...!> - which as
>> it
>> > is defined as a fixed size of 51 means it is of size 29 inside force -
>> > even
>> > though the ener passed to force from sander was of size 30!!!
>> >
>> > So then inside force we have:
>> >
>> > _REAL_ ener(*) ! offsets in this ener array = offsets in runmd ener -
>> 22
>> > _REAL_ ene(30)
>> >
>> > So ener(*) is actually of size 29 inside force since it was passed in
>> with
>> > an offset of 23 and is size 51 outside (actually ener in force is of
>> size
>> > 30
>> > when called directly from sander since it was passed sander's ene
>> > here).
>> >
>> > Tracing ener through force.f we see that typically it is used up to
>> value
>> > 28. However on line 1165 it is passed to cns_xref_run:
>> >
>> > call cns_xref_run(natom,ih(m04), x,f,ener)
>> >
>> > Now looking inside cns_xref_run we have:
>> >
>> > subroutine cns_xref_run(natom,ih,x,fg,ener)
>> > _REAL_, intent(inout) :: ener(*)
>> > ...
>> > call rdxraycns(natcns,medcns,ener,fg)
>> > ...
>> > ener(16) = ener(47)
>> > ener(18) = ener(49)
>> > ener(19) = ener(50)
>> >
>> > So cns_xref_run accesses elements 47, 49 and 50 of ener even though
>> > ener
>> > is
>> > only of size 1:29 in this scope! DOH!.
>> >
>> > Then it gets EVEN WORSE:
>> >
>> > inside rdxraycns:
>> >
>> > subroutine rdxraycns(natcns,medcns,ener,fg)
>> >
>> > implicit none
>> > _REAL_ ener(*),fg(*),xgrd1,xgrd2,xgrd3
>> > ...
>> > read(ipdb,'(E20.14)') ener(47)
>> > read(ipdb,'(A80)') line
>> > read(ipdb,'(E20.14)') ener(48)
>> > read(ipdb,'(A80)') line
>> > read(ipdb,'(F8.6,1X,F8.6)') ener(50),ener(49)
>> >
>> > So this actually writes to elements ener(50), 49 etc etc. This is
>> > completely
>> > illegal as far as I can tell and is just corrupting memory somewhere -
>> > I
>> > am
>> > shocked it doesn't segfault.
>> >
>> > And this is JUST TOUCHING THE SURFACE of this ene / ener mess!
>> >
>> > Then in evb_ntrfc we have ener passed in from force and lines like:
>> >
>> > ener(1:26) = 0.0d0
>> >
>> > Any reason why it is zeroing just elements 1:26 of ener here and not
>> > the
>> > full 1:29 (or 1:30 if force was called from sander)?
>> >
>> > Then on line 1206 of force we have:
>> >
>> > if (master) call
>> mpi_reduce(ener,totener(23),28,MPI_DOUBLE_PRECISION,
>> > &
>> > MPI_SUM,0,commmaster,ierr)
>> >
>> > So only elements 1 to 28 of ener here are reduced.
>> >
>> > Then inside fdist (called from force with ene) we have:
>> >
>> > do i = 2,30
>> > f(j+1+i) = ene(i)
>> > end do
>> >
>> > so ene is reduced from 2 to 30 even though as far as I can tell it is
>> only
>> > used up to element 28.
>> >
>> > and then we have horrendous statements like:
>> >
>> > call
>> >
>> mpi_allreduce(MPI_IN_PLACE,f(j),33,MPI_DOUBLE_PRECISION,mpi_sum,commsander
>> ,i
>> > err)
>> > vir(1) = f(j)
>> > vir(2) = f(j+1)
>> > vir(3) = f(j+2)
>> > do i = 2,30
>> > ene(i) = f(j+i+1)
>> > end do
>> >
>> > where the 33 is hard coded - change the size of ene and you get screwed
>> > but
>> > you would be hard pressed to find this given it is hard coded as 33
>> > (rather
>> > than the 30 for ene) since it also contains the virials etc...
>> >
>> > So, having spent 4 hours this evening trying to extend these arrays to
>> > include the charmm urey-bradley and improper terms and finding that if
>> > I
>> > change anything I end breaking things either in parallel, serial or
>> > both
>> I
>> > am hoping that someone here has a decent understanding of how these ene
>> /
>> > ener arrays are actually supposed to work.
>> >
>> > And the memory errors in cns_xref etc (and possibly others elsewhere
>> that
>> > I
>> > haven't found) should be fixed.
>> >
>> > All the best
>> > Ross
>> >
>> > /\
>> > \/
>> > |\oss Walker
>> >
>> > | Assistant Research Professor |
>> > | San Diego Supercomputer Center |
>> > | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> > | http://www.rosswalker.co.uk | PGP Key available on request |
>> >
>> > Note: Electronic Mail is not secure, has no guarantee of delivery, may
>> not
>> > be read every day, and should not be used for urgent or sensitive
>> issues.
>> >
>> >
>> >
>> >
>
>
Received on Fri Dec 05 2008 - 16:23:28 PST
Custom Search