Re: [AMBER-Developers] git logs and squashing commits

From: Jason Swails <jason.swails.gmail.com>
Date: Wed, 24 Feb 2016 20:06:54 -0500

I'm late to the party, so I'll summarize some points:

- We need to squash coming from GH to Amber because otherwise the commits
from GH that get merged into the tree contain *only* the code from the
subtree that was merged in (the rest of the source code from those commits
disappears). Try doing "git checkout
4c492eaf3315e89e2072e358f806728ffc442eaa".
This completely breaks git-bisect, which I have used *many* times to debug
problems that crept into sander and/or pmemd that took months to notice.
So I started deleting the huge squash commit logs to avoid precisely this
issue (which Ross complained about in a recent post).

- Hai nicely summarized the benefits of GitHub -- I won't rehash. The
Amber git repo, on the other hand, is heavy and does not promote good
software development practices in many instances. It discourages code
reviews. Lack of continuous integration (as a prerequisite to merging)
means even people that are careful don't notice when they break the tree --
even if it's as simple as forgetting to commit a file. This happens all
the time, and I have seen *everyone* that commits to the tree do this. No
exceptions. By contrast, the master branches of cpptraj, ParmEd, and
pytraj *always* pass the tests. Collaboration with others is far harder,
as our programs are not "discoverable" outside of AmberTools.

As for branding -- having different components in different repositories is
the *norm* -- both in industry and in open source. As examples, look at
the LLVM and clang projects (which themselves are separate from Swift) --
all are part of the same Apple ecosystem (all in their compiler technology
package), and all are separate repos. That doesn't subtract from the XCode
or Apple brand. Ditto with numpy and scipy (which are more interconnected
than many parts of Amber). Also, we already have an Amber github
organization: https://github.com/amber-md.

I'm involved in many projects and organizations -- Amber is the only one
that uses a single repo as an umbrella for so many disparate projects. And
many companies -- large and small -- put their code on Github to maximize
exposure. We're also not locked in -- if GH goes down, we simply host
somewhere else. But that's not likely to happen.

GitHub helps us (Hai, Dan and I at the moment) develop our programs
efficiently and reliably with a higher degree of code quality (courtesy of
issue tracking and code reviews), and it has resulted in valuable
contributions from other researchers. Every component we have on GitHub
has received contributions from multiple people, including ParmEd, cpptraj,
pytraj, tleap, and antechamber. The upside of a GitHub presence for our
components so clearly outweighs the downsides from a development
perspective, that moving *away* from that model is not an option for me
with ParmEd (nor Dan and Hai with cpptraj and pytraj per my conversations
with them). We should instead find a way to leverage everything this
infrastructure has to offer to our advantage.

git submodules can help with that -- that is *precisely* for managing a
superstructure that is composed of git repos (so that git knows where to go
off and clone all of its components). What I'd like to see is each
component in a separate repo with only pmemd on our own servers, all
enveloped within the Amber-MD GitHub organization. Code reviews, proper
issue tracking, ...

Some things to think about for the dev meeting.

All the best,
Jason

On Wed, Feb 24, 2016 at 6:30 PM, Ross Walker <ross.rosswalker.co.uk> wrote:

>
> > what's be benefit of thinking AMBER as a whole package rather a suite of
> > difference packages?
>
> > why do we really need to centralize it? and why pain?
> >
>
>
> http://www.clarity-in-communication.com/getattachment/e69c0b29-934a-4b13-9f8f-d6a1d6274cfc/A-strong-brand.aspx
>
> Branding, branding, branding... People respect the AMBER developers (at
> least I hope they do), they respect and they trust the software.
>
> Also take a look at things like this:
> https://scholar.google.com/scholar?oi=bibs&hl=en&cites=1359192529121551417
>
> Amber gets thousands of citations. People who make solid contributions
> should be included on the author list and benefit from being associated
> with all of these citations.
>
> Try creating a stand alone tool - say pytraj - in isolation from Amber -
> see how many citations you get vs if you include it as part of Amber.
>
> Without a doubt it will get way more exposure and use and subsequently
> your scientific career will ultimately benefit if it is part of the wider
> AMBER development effort and you are considered part of the AMBER
> development team. If that is not the case then we should all just give up
> now and go our separate ways.
>
> > taking cpptraj as example (since people know it):
> >
> > 1. centralized AMBER: if people want to add code to cpptraj
> > + clone amber repo
> > + make a new branch, make code change, recompile, do a bunch of testings
> > with different combinations
> > + merge to ambe master branch, push to remote
> >
> > 2. github: if people want to add code to cpptraj
> > + clone cpptraj repo on github
> > + make a new branch, make code change, push to github and let travis do
> all
> > the testing. No worry about old computer.
> > + Dan will review code, merge code if he thinks ok
> > + merge back to amber repo, push to remote.
> >
> > The only difference I can think of is that package manager (Dan for
> > cpptraj) needs to approve the code change (which is good) + many
> advantages
> > using travis.
> >
>
> This sounds reasonable - for an isolated case - expand that to every
> package and we have a complete mess. We also dilute the concept of the
> AMBER brand and the AMBER development team. Ultimately everyone that is
> associated with the AMBER team benefits from cpptraj being part of AMBER.
> That's the ultimate benefit here. We are way stronger as a collective than
> we are as a bunch of individual projects. I know Dan says the github branch
> is part of AMBER but I am not sure that message will always get propagated
> properly going forward and it definitely will become harder if we all go
> down this individualist route. For starters people will likely say things
> in a paper like: We used the github version of cpptraj[ref link to cpptraj
> github page accessed on blah - they 'might' also cite the cpptraj paper but
> there are many example tools in AMBER that don't have standalone
> publications] - bang there goes the AMBER citation - everybody loses
> [except maybe Dan].
>
> > Note: If anyone does not like github, they can still edit cpptraj code in
> > amber repo (but not encouraged to do so).
>
> It's not about liking or not liking github. It is that we need to preserve
> the concept of AMBER as a package and the idea of a coherent AMBER
> development team. If we have a public repository it should reflect this
> unity.
>
> My 0.02 BTC
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> ---------------------------------------------------------
> | Associate Research Professor |
> | San Diego Supercomputer Center |
> | Adjunct Associate Professor |
> | Dept. of Chemistry and Biochemistry |
> | University of California San Diego |
> | NVIDIA Fellow |
> | http://www.rosswalker.co.uk | http://www.wmd-lab.org |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> ---------------------------------------------------------
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>



-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed Feb 24 2016 - 17:30:02 PST
Custom Search