In general there are 3 common "families" of tests
1. Unit tests
2. Regression tests
3. Integration tests
In my opinion, (2) and (3) are the best, with (3) being by far the best
(this is the test in which a whole study is reproduced and gives the same
converged answer as a reference). Category 1 is typically white-box (i.e.,
the tests are written with knowledge of implementation details in order to
flex corner cases and test all aspects of the code), but are very limited
in scope (they only test one subroutine, class, or function at a time
generally). They're usually very fast and the easiest way to test each
line of code in a code base. But many of Amber's programs don't lend
themselves to these kinds of tests (many subroutines are not nearly short
enough or independent enough to be realistically unit-tested).
We also really don't have any of (3). A couple "big" studies that can be
run over several days on enough computing resources to reproduce a
real-world study with all of our codes (pmemd, pmemd.MPI, sander,
sander.MPI, pmemd.cuda, and pmemd.cuda.MPI, along with MMPBSA.py, cpptraj,
etc.). These are difficult to set up and even more difficult to dig up
resources someone's willing to divert from real research to apply to these
large-scale integration tests. We realistically don't have the resources
to carry out the tests that best practices in software engineering dictates
we should.
The next best thing we can do is for (at least some) developers to use the
development version of Amber for production work. This is what I always
did, and this often slowed my research. One time in particular it cost me
~2.5 months to even realize a bug had been introduced, and took me another
couple weeks or so to track it down and fix it (it was very subtle). While
this hurt my research efficiency, it prevented this nefarious bug from ever
being exposed to users. For reference, here's my commit that fixed the
bug: 1c6df84ba91e9ed93475e9fb942acdffdc8dde4e, and the commit that
introduced it: 56397676ea6e89fb5580917bd081ab279f51bef5. This change would
have passed all regression tests, and there's no way Ross would have known
that this change would have broken my code. It's something that could only
have been caught by using the development code to do production simulations
and finding the bug via brute force. We just need many people to actually
*do* that.
This is another reason that I think it's important to migrate to a workflow
where continuous integration is enforced. It's risky enough to always use
an updated development version to do your research -- when there's a decent
chance that an updated development version won't even *build* because the
commit wasn't well-enough vetted, doing this becomes painfully annoying. I
think we need some kind of system that makes always using the head of
master as painless as possible.
But in the absence of integration tests, what we're left with is our set of
regression tests, which for the most part actually seem to have been doing
a decent job so far.
All the best,
Jason
On Thu, Mar 10, 2016 at 1:35 PM, Scott Brozell <sbrozell.rci.rutgers.edu>
wrote:
> Hi,
>
> In addition to a discussion and decision on language standard and old
> compiler support (for Amber 17) at the Amber developers meeting,
> I suggest a discussion about release schedules.
>
> I recommend a stricter schedule and with a more whole community
> focus on quality assurance - to wit a halloween code freeze for a tax
> day release, a separate release candidate branch, a mandate that developers
> use the release candidate branch to reproduce production work, etc.
>
>
> Grabbing from the [AMBER-Developers] pmemd.MPI build broken thread:
>
> On Sat, Mar 05, 2016 at 07:50:04AM +0000, Charles Lin wrote:
> > ... Its been thoroughly tested with intel, gnu, and mpich though.
>
> > >> > On Sat, Mar 05, 2016, Jason Swails wrote:
> > > 2) Wait to release it until a wider audience of developers have
> actually
> > > gotten a chance to use it.
> > > This is a large part of why we institute a code freeze well before
> release.
>
> Thoroughly tested is a high standard. It requires evidence. For a release
> candidate, in the absence of counterevidence, i would find this persuasive:
> source, tests, and docs in the release candidate branch for 1/2 year so
> that
> the whole community has had access and it's been routinely built and
> tested.
> Reproduction of published work by and vetting of new features in several
> research groups.
>
>
> > >> > On Sat, Mar 05, 2016, Jason Swails wrote:
> > > Mixed OpenMP-MPI has its place for sure -- MICs and dedicated
> > > supercomputers with many cores per node. But for commodity clusters
> and
> > > single workstations, I see this as more of an obstacle than a benefit.
>
> There will be different perspectives. Rushing the schedule tends to
> crank up the issues. Let's go slower, give more people a chance to
> comment, and give ourselves a better chance to meet our high standards.
>
> scott
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Mar 10 2016 - 19:30:03 PST