Re: [AMBER-Developers] Experiences with sleap from Jason Swails on 2011-11-03 (Amber Developers Archive Nov 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Thu, 3 Nov 2011 20:31:33 -0400

On Thu, Nov 3, 2011 at 5:45 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

> Hi All,
>
> Can I perhaps offer some thoughts and points for discussion here that we,
> as
> the developers of AMBER, might want to consider.
>
> Look at the example we have here. We have a working code, tleap / xleap
> that
> had some problems and needed some extra stuff adding etc. But instead of
> people trying to fix stuff we get a whole new version written from scratch
> which has its own problems. Why oh why are we making yet another imperfect
> square wheel?
>

> If we keep going down this route we are just going to end up with a ton of
> tools that all partially duplicate each other, all of which have problems
> and most of which are not properly tested or maintained.
>

Amber is poorly documented for developers, with the exception of a couple
programs (and actually, I think cpptraj is a prime example of rather
detailed API documentation). If you want to stop reinventing the wheel,
you have to start enforcing API documentation. If you lose flexibility
completely, you'll find the number of people willing to contribute to Amber
drop considerably (they'll modify it for their own purposes, but not
share). We don't want that, either. It's a balancing game. We're not
typically software engineers or professional programmers -- we're
scientists.

For example, I think we have about 7 different sets of code for parsing
> prmtop files, in or out, in the tree right now. None of these work properly
> and if we tweak anything in the prmtop file there is a multitude of stuff
> to
> update.
>

This I will defend. I think restricting languages that programs are to be
written in is a terrible policy. To say "anything you write must be
written in C" is silly. Thus, it stands to reason that each language used
that needs prmtop data could have its own prmtop parser/API. There's
nothing unreasonable about that, and in fact enhances the user experience.
But again, no other APIs are documented! Who is going to learn an API from
studying (sometimes difficult-to-understand) code, then modifying it so it
can do what they need when it's way faster to write their own?

The same goes for things like cpptraj and mdgx etc. Why are we continually
> reinventing things? We should be forcing people to fix the existing code
> not
> going off and writing their own version in their own favorite language
> which
> nobody can maintain etc etc.

You do this and you'll see people just not write contributed code. There
are times when the effort involved in "fixing" the existing code is so
large, that a rewrite makes sense. Dan discussed this in detail at the
last developer meeting for describing why cpptraj was useful.

> Take mdgx. Is the intention here to make
> another dynamics engine that we need to maintain? Or is to replace sander
> or
> pmemd? If so what is the plan for this? If it is a sandbox for testing new
> things why aren't you using pmemd for this? - in your own branch and
> looking
> for a way to ultimately incorporate the cool new technology in pmemd so it
> benefits from the existing user base and all the other framework? If it is
> a
> sandbox for your own development then why is it being distributed
> publically
> under AMBER in what is essentially an unfinished fashion?
>

Nothing in Amber is finished. That said, I will suggest that mdgx may
benefit more as a stand-alone project rather than as part of
Amber/AmberTools. With a closed-off git tree, mdgx development code will
be open only to those that will have the option of playing in mdgx or
sander, for which I think the overwhelming majority of people will choose
to test their stuff in sander. If mdgx was on github, for instance, it
would benefit a wider audience and have a stronger possibility for
attracting future developers and maybe extended beyond the reach of just
the Amber force fields (and there's nothing preventing it from being
featured on the Amber website).

I honestly don't know what would be better for mdgx here -- something like
Github, or AmberTools (where it will likely be ripped out if Dave C. leaves
the Amber fold for any significant chunk of time).

Cpptraj is designed to replace ptraj yes? Otherwise why do we have two codes
> that effectively do the same thing? If we truly want to replace ptraj then
> there should be agreement from everyone, cpptraj should be developed and
> tested locally and it made to support everything ptraj does - unless there
> is stuff we agree to deprecate. Then we replace ptraj with cpptraj and go
> from there. We don't need 2 copies of it. Really...
>

Dan made good arguments here at the last developer meeting. cpptraj can do
things that ptraj would have to be redesigned from the ground up to
accomplish. As I was talking to Ben about recently, ptraj was a series of
functionalities piggy-backed onto a code initially designed to validate
topology files created by an infant (buggy) tleap (and there was no rleap
there for people to fall back on!) Cpptraj was designed from the beginning
to be what ptraj became. That, and it's well-documented for both users and
developers.

Then we have Jason's parmed code, which is great don't get me wrong,

Thanks for that ;)

> but it is yet another place we have to update if we change anything
> elsewhere with
> prmtops etc.

Actually, the only time you'll need to update parmed is if you completely
change the format of the prmtop altogether, which I don't see happening.

> And why is it standalone? - Another program to document and
> people to learn. Why was the functionality, which is desperately needed,
> not
> added to ptraj (rdparm)? That would be the logical place for it and would
> ultimately give a better use experience and a more consistent interface.
>

I'll defend this more at the meeting, if it finds its way in. parmed was
written almost specifically to address what you're talking about now.
Python is an easy language to understand (far easier to pick up than C,
C++, Java, even Fortran), and modifying topology files is almost completely
insensitive to the 10-100x speedup gained by using a compiled language. I
think parmed is easily maintainable after I leave (if it's not, rip it
out), and is a good place for people to dump their prmtop-editing
requirements, rather than create a whole new program (add_hcp, add_pdb,
addles, etc. etc.).

If we were Google or Microsoft, we'd dump this all into rdparm, but I don't
*just* program. If you tell me my parmed functionality will only be
distributed if I implement it in rdparm, the functionality would simply not
exist in Amber. Which is better for Amber? Also, igb==8 requires a
different radius set which is non-trivial to add to LEaP (or it would have
been done). Had I not included it in parmed, would Carlos' group have
released a *specific* igb-8-radii-converting program? (They had a Python
script written, whose functionality I duplicated a little more generally in
my program) I think Amber should include a (easy) way to generate the
radii that igb=8 requires, and if it was forced upon them to implement it
in tleap, it could breed frustration and potentially an even more painful
process for the users (who may have to go to Carlos' website to download a
tool to do it because it wasn't included in Amber in the first place).
It's a trade-off in which we have to give in a little more to code bloat
than we would in a software company setting.

The code bloat these days is getting awful and unmaintainable and I move
> that at the developers meeting we should come up with a clear plan for how
> we will remove duplication from AMBER and get back to a clean set of tools
> that we can maintain going forward, can document efficiently and most
> importantly can teach to new graduate students and postdocs without
> confusing the hell out them why we have 10 different ways to boil an egg.
>

Code bloat is inevitable in Amber. It is the cost of our flexible,
encouraging development model, which I like (the model, not the bloat).
Occasionally we have to clean house (enter developers meetings). It's
happened in the past, it'll happen again in the future. It's a fine line
to walk, and what's "best" for amber is a happy middle, IMO.

The End.
Jason

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Thu Nov 03 2011 - 18:00:03 PDT