Re: amber-developers: PIMD, NEB, LES - request for code inspection and tests from Carlos Simmerling on 2007-07-24 (Amber Developers Archive Jul 2007)

From: Carlos Simmerling <carlos.simmerling.gmail.com>
Date: Tue, 24 Jul 2007 12:10:57 -0400

hi all,
just to clarify- when I say "partial NEB" with multisander,
I mean that the whole system is replicated but spring forces
are only applied to a subset that we define with an atommask in
the mdin files. We use 1 mask for the fit, and another for the neb springs.
this means we can do what Ross suggested- use multisander to
run multiple MD beads in explicit solvent, but not apply springs to the
solvent. For example, I have duplex DNA in explicit water, and I
am fitting to only the duplex but applying springs just to one base
that is getting everted in the pathway. water is not involved in NEB
but since the springs are only applied to 1 base the rest of the DNA
can do as it wishes and I'm not forcing it to look exactly like the
instantaneous bend, etc in the endpoint structure. I think that this
does require a bit more thought as far as user choices go, but it will
work just like the old NEB code if the default mask is made to select
all atoms for both the fitmask and rmsmask. I can't see any easier way
to make multisander NEB work with explicit water.
My next step is to try to reduce the communication further by only
sharing the coordinates of atoms in the masks (ie not the water).

Ross- good idea about the coding retreat, but I can't stay after the
amber meeting due to teaching.
carlos

On 7/24/07, Wei Zhang <zweig.scripps.edu> wrote:
>
>
> Ross Walker wrote:
>
> >Hi Carlos,
> >
> >
> >
> >>I've been discussing this with Ross and Dave M.
> >>I'm making changes needed to have NEB work with explicit water,
> >>which it won't currently do.
> >>
> >>
> >
> >Now I have some time to stop and look at this I remember that there actually
> >exists a version of NEB that works as part NEB with explicit water etc. I'm
> >not sure that it made it into Amber 9 but I believe it is in the Amber 10
> >tree somewhere - likely just before it was converted over to multisander. I
> >don't know exactly where in the tree this is - it is likely around revision
> >9.13 of sander:
> >
> >revision 9.13
> >date: 2006/10/17 15:58:02; author: zweig; state: Exp; lines: +3 -17
> >New pimd implementation based on multi-sander framework. Meanwhile, remove
> >the following exectables: sander.PIMD, sander.CMD, sander.PIMD.A1ST,
> >sander.CMD.A1ST from compilation
> >
> >I.e. just before this. Wei implemented this since the way NEB worked then,
> >through PIMD, it seemed quite easy to do. The approach was that for partial
> >NEB the non-replicated sections saw the average force of all of the replicas
> >- I believe this is the same way LES works?
> >
> Actually there is a slignt difference between the LES-pme and PIMD-pme
> implementation. PIMD acted in the way you mentioned, that non-replicated
> sections saw the average force of all the replicas. LES-pme implementaion
> is more complicated then that. I think Carlos has a paper in JACS discussing
> this problem. It is complicated because it can handle multiple-LES region,
> i.e. you can two regions which were replicated respectively.
>
> >
> >The aim of moving to multisander and ditching all of these extra options was
> >so that the communication overhead of having everything replicated, which
> >stemmed from the original sander (egb) implementation, where everything was
> >replicated anyway so it made no difference what approach was taken with
> >regards to distribution of coordinates etc, but this was never completed.
> >Wei moved NEB out of PIMD and into multisander but left it with the
> >mpi_bcast's etc. What it now needs is modifying so that communications of
> >coordinates and forces only occurs between immediate neighbours - possibly
> >in only one direction but I'd have to check this. Ideally this should be
> >done with a non-blocking approach and then computation can be overlapped
> >with the communication and we should be able to scale to huge numbers of
> >processors, I would expect at least 16 to 32 cpus per replica...
> >
> >
> In fact I did not move NEB to multi-sander. the old code is left there
> since I think we might need partial NEB someday, since we already have
> partial PIMD working well, it won't be difficult to implement partial NEB,
> one just need to implement the subroutine part_neb_forces() according
> to full_neb_forces().
>
>
>
>
>
>
> >But yes this does require changing the way in which NEB operates. At the
> >same time the output could be greatly reduced since at the moment there is
> >massive duplication of data in all the nreplica outputs - plus there is a
> >ton of multisander debug stuff being written to standard out by Amber 10
> >that should probably be turned off. - Plus the way the timings are done at
> >the end of the run needs to be radically changed to avoid bottlenecks - for
> >example running on 2048 cpus of abe at the moment it requires almost 30
> >minutes at the end of a run just to collate all the timings and write the
> >profiling etc. We need some way of turning this off when you don't need the
> >debugging info - I.e. Only the master thread writes the timings and then it
> >just writes it's own timings rather than the average over all nodes. - Or we
> >just need a much smarter way of calculating the average.
> >
> >All of this has been on my todo list for ages :-( So Carlos if you want to
> >try and do it then that is great and I am happy to help out as I can.
> >
> >On an associated note I think we need a huge audit of the parallel code
> >since there is so much uneccessary communication going on in certain parts
> >of the code... We really should have some guidelines on use of MPI - I.e. it
> >is NOT okay to be using ANY of the all_to_all communicators or bcasts
> >outside of the initial setup unless you really really really can justify the
> >need for them. And "because it makes the code easier to read" is really not
> >a justification...
> >
> >Anyway, more stuff for the to do list - maybe we need a week long code
> >retreat somewhere where a number of us get together and do nothing but work
> >on cleaning up the code. We could tag this onto the end of the developers
> >meeting if anyone wants to stay in San Diego for longer.
> >
> >All the best
> >Ross
> >
> >/\
> >\/
> >|\oss Walker
> >
> >| HPC Consultant and Staff Scientist |
> >| San Diego Supercomputer Center |
> >| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> >| http://www.rosswalker.co.uk | PGP Key available on request |
> >
> >Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> >be read every day, and should not be used for urgent or sensitive issues.
> >
> >
> >
> >
> >
> >
>

-- 
===================================================================
Carlos L. Simmerling, Ph.D.
Associate Professor                 Phone: (631) 632-1336
Center for Structural Biology       Fax:   (631) 632-1555
CMM Bldg, Room G80
Stony Brook University              E-mail: carlos.simmerling.gmail.com
Stony Brook, NY 11794-5115          Web: http://comp.chem.sunysb.edu
===================================================================

Received on Wed Jul 25 2007 - 06:07:37 PDT