Re: [AMBER-Developers] Experiences with sleap from Jodi Ann Hadden on 2011-11-04 (Amber Developers Archive Nov 2011)

From: Jodi Ann Hadden <jodih.uga.edu>
Date: Fri, 4 Nov 2011 19:36:45 +0000

I don't know what the "official" AMBER development model is, but I feel the model as Jason describes it is in some ways unacceptable. I understand everyone is limited by time and resources and that the basic generosity of people who contribute code is what allows progress and helps to increase functionality in AMBER, but I feel failure of contributors to take into account the "big picture" is detrimental to the software in the long term.

I get the impression from Jason that much contribution is coming from independent users on a "crisis to crisis" basis. Basically that someone requires some functionality that is not present in AMBER and so writes their own code to do it, and because their concern is primarily toward their own work and not AMBER overall, they perform this task in whatever way is most convenient for them, in whatever language they know without the extra effort of trying to incorporate it into an existing module. I'm not necessarily calling that lazy because I know everyone has limited amounts of time to donate, and I do find it generous that users would share their code with AMBER, but I also think accepting so many standalone programs with overlapping functionality constitutes a poor development model as it encourages a "cobbled" consistency to the software.

I don't think rules calling for "less flexibility" in development are a bad idea. It seems to me that restricting the languages that AMBER modules should be written in would be quite a good policy, actually, making everything easier to maintain in the long run, easier to condense and consolidate modules or transfer functionality between them when necessary, easier for developers to contribute to multiple modules, etc. Having a prmtop parser for every possible language is silly as it only encourages cobbling. An excess of code where subsets of users only understand subsets of the code due to a "language barrier" seems utterly detrimental to progress, whereas a few common languages could have a unifying effect on the code and improve communication among developers/contributors.

And I don't think such a restriction would discourage contributed development. There are plenty of people with enough passion for the science and the software who want to be involved in something bigger than their own work who will still be interested in contributing directly to AMBER -- in terms of the existing software as well as new modules with novel functionality -- even if it does mean learning a new language. What is so wrong with that? I relish an excuse to learn a new language! That people have a desire to be involved is evidenced by the fact that they share their code at all.

I guess I think, based on the impression I got from Jason's email, that perhaps development is being encouraged in the wrong way. If users want to contribute, they should be encouraged to work with other AMBER developers to incorporate their ideas into existing AMBER or design a coherent, foresighted new module in AMBER to accommodate it, instead of just cobble something else on that is of limited use due to overlap, language barriers, hard to read/poorly documented, etc. That way, the devs already familiar with the code get a handle on the new functionality, and the new contributor gains knowledge of how the existing code works. This cross-communication and redundancy, not in the code but in the people who understand the code, seems the best way to manage streamlined software maintained by a community. "Rip it out if you can't maintain it after I'm gone" is unacceptable.

Oeru txoa livu.

Jodi

On Nov 3, 2011, at 8:31 PM, Jason Swails wrote:

> On Thu, Nov 3, 2011 at 5:45 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>> Hi All,
>>
>> Can I perhaps offer some thoughts and points for discussion here that we,
>> as
>> the developers of AMBER, might want to consider.
>>
>> Look at the example we have here. We have a working code, tleap / xleap
>> that
>> had some problems and needed some extra stuff adding etc. But instead of
>> people trying to fix stuff we get a whole new version written from scratch
>> which has its own problems. Why oh why are we making yet another imperfect
>> square wheel?
>>
>
>> If we keep going down this route we are just going to end up with a ton of
>> tools that all partially duplicate each other, all of which have problems
>> and most of which are not properly tested or maintained.
>>
>
> Amber is poorly documented for developers, with the exception of a couple
> programs (and actually, I think cpptraj is a prime example of rather
> detailed API documentation). If you want to stop reinventing the wheel,
> you have to start enforcing API documentation. If you lose flexibility
> completely, you'll find the number of people willing to contribute to Amber
> drop considerably (they'll modify it for their own purposes, but not
> share). We don't want that, either. It's a balancing game. We're not
> typically software engineers or professional programmers -- we're
> scientists.
>
> For example, I think we have about 7 different sets of code for parsing
>> prmtop files, in or out, in the tree right now. None of these work properly
>> and if we tweak anything in the prmtop file there is a multitude of stuff
>> to
>> update.
>>
>
> This I will defend. I think restricting languages that programs are to be
> written in is a terrible policy. To say "anything you write must be
> written in C" is silly. Thus, it stands to reason that each language used
> that needs prmtop data could have its own prmtop parser/API. There's
> nothing unreasonable about that, and in fact enhances the user experience.
> But again, no other APIs are documented! Who is going to learn an API from
> studying (sometimes difficult-to-understand) code, then modifying it so it
> can do what they need when it's way faster to write their own?
>
> The same goes for things like cpptraj and mdgx etc. Why are we continually
>> reinventing things? We should be forcing people to fix the existing code
>> not
>> going off and writing their own version in their own favorite language
>> which
>> nobody can maintain etc etc.
>
>
> You do this and you'll see people just not write contributed code. There
> are times when the effort involved in "fixing" the existing code is so
> large, that a rewrite makes sense. Dan discussed this in detail at the
> last developer meeting for describing why cpptraj was useful.
>
>
>> Take mdgx. Is the intention here to make
>> another dynamics engine that we need to maintain? Or is to replace sander
>> or
>> pmemd? If so what is the plan for this? If it is a sandbox for testing new
>> things why aren't you using pmemd for this? - in your own branch and
>> looking
>> for a way to ultimately incorporate the cool new technology in pmemd so it
>> benefits from the existing user base and all the other framework? If it is
>> a
>> sandbox for your own development then why is it being distributed
>> publically
>> under AMBER in what is essentially an unfinished fashion?
>>
>
> Nothing in Amber is finished. That said, I will suggest that mdgx may
> benefit more as a stand-alone project rather than as part of
> Amber/AmberTools. With a closed-off git tree, mdgx development code will
> be open only to those that will have the option of playing in mdgx or
> sander, for which I think the overwhelming majority of people will choose
> to test their stuff in sander. If mdgx was on github, for instance, it
> would benefit a wider audience and have a stronger possibility for
> attracting future developers and maybe extended beyond the reach of just
> the Amber force fields (and there's nothing preventing it from being
> featured on the Amber website).
>
> I honestly don't know what would be better for mdgx here -- something like
> Github, or AmberTools (where it will likely be ripped out if Dave C. leaves
> the Amber fold for any significant chunk of time).
>
> Cpptraj is designed to replace ptraj yes? Otherwise why do we have two codes
>> that effectively do the same thing? If we truly want to replace ptraj then
>> there should be agreement from everyone, cpptraj should be developed and
>> tested locally and it made to support everything ptraj does - unless there
>> is stuff we agree to deprecate. Then we replace ptraj with cpptraj and go
>> from there. We don't need 2 copies of it. Really...
>>
>
> Dan made good arguments here at the last developer meeting. cpptraj can do
> things that ptraj would have to be redesigned from the ground up to
> accomplish. As I was talking to Ben about recently, ptraj was a series of
> functionalities piggy-backed onto a code initially designed to validate
> topology files created by an infant (buggy) tleap (and there was no rleap
> there for people to fall back on!) Cpptraj was designed from the beginning
> to be what ptraj became. That, and it's well-documented for both users and
> developers.
>
> Then we have Jason's parmed code, which is great don't get me wrong,
>
>
> Thanks for that ;)
>
>
>> but it is yet another place we have to update if we change anything
>> elsewhere with
>> prmtops etc.
>
>
> Actually, the only time you'll need to update parmed is if you completely
> change the format of the prmtop altogether, which I don't see happening.
>
>
>> And why is it standalone? - Another program to document and
>> people to learn. Why was the functionality, which is desperately needed,
>> not
>> added to ptraj (rdparm)? That would be the logical place for it and would
>> ultimately give a better use experience and a more consistent interface.
>>
>
> I'll defend this more at the meeting, if it finds its way in. parmed was
> written almost specifically to address what you're talking about now.
> Python is an easy language to understand (far easier to pick up than C,
> C++, Java, even Fortran), and modifying topology files is almost completely
> insensitive to the 10-100x speedup gained by using a compiled language. I
> think parmed is easily maintainable after I leave (if it's not, rip it
> out), and is a good place for people to dump their prmtop-editing
> requirements, rather than create a whole new program (add_hcp, add_pdb,
> addles, etc. etc.).
>
> If we were Google or Microsoft, we'd dump this all into rdparm, but I don't
> *just* program. If you tell me my parmed functionality will only be
> distributed if I implement it in rdparm, the functionality would simply not
> exist in Amber. Which is better for Amber? Also, igb==8 requires a
> different radius set which is non-trivial to add to LEaP (or it would have
> been done). Had I not included it in parmed, would Carlos' group have
> released a *specific* igb-8-radii-converting program? (They had a Python
> script written, whose functionality I duplicated a little more generally in
> my program) I think Amber should include a (easy) way to generate the
> radii that igb=8 requires, and if it was forced upon them to implement it
> in tleap, it could breed frustration and potentially an even more painful
> process for the users (who may have to go to Carlos' website to download a
> tool to do it because it wasn't included in Amber in the first place).
> It's a trade-off in which we have to give in a little more to code bloat
> than we would in a software company setting.
>
> The code bloat these days is getting awful and unmaintainable and I move
>> that at the developers meeting we should come up with a clear plan for how
>> we will remove duplication from AMBER and get back to a clean set of tools
>> that we can maintain going forward, can document efficiently and most
>> importantly can teach to new graduate students and postdocs without
>> confusing the hell out them why we have 10 different ways to boil an egg.
>>
>
> Code bloat is inevitable in Amber. It is the cost of our flexible,
> encouraging development model, which I like (the model, not the bloat).
> Occasionally we have to clean house (enter developers meetings). It's
> happened in the past, it'll happen again in the future. It's a fine line
> to walk, and what's "best" for amber is a happy middle, IMO.
>
> The End.
> Jason
>
> --
> Jason M. Swails
> Quantum Theory Project,
> University of Florida
> Ph.D. Candidate
> 352-392-4032
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Nov 04 2011 - 13:00:03 PDT