RE: amber-developers: Fw: How many atoms? from Thomas Cheatham on 2007-12-05 (Amber Developers Archive Dec 2007)

From: Thomas Cheatham <tec3.utah.edu>
Date: Wed, 5 Dec 2007 09:20:46 -0700 (Mountain Standard Time)

Well, since bob asked if I'm on board; I wrote this late last night and I
do not really know who is on this list so I should be careful in what I
say, but...

> That's not really true. Simulations with 0.5M is about as routine today as
> 20,000 atoms a few years ago. We are talking about protein complexes in

I did not want to jump into this fray, but as I busily review proposals
for computer time at the NSF centers tonight (and I'm a bit late, as
usual)-- proposals that mostly involve running simulations of less than
100K atoms-- I would contend that we do not need to reach that megascale,
yet, and I do say "yet". NAMD works, and at present there are very few
large scale simulations beyond those of a few groups, like Schulten
(viruses, Bar domains, ...), Voth (Bar domains), Sanbonmatsu
(ribosome), ...

I would contend that 500K is not as routine as 20K was a decade
ago. Moreover, we have many many more people running simulations in the
10-100K range now than we ever did in *any* range a decade ago (including
in-vacuo). The field has exploded.

My personal take on the large hero simulations is that they are easy.
With so many atoms, there is a much smaller chance of a small instability
or local error occuring that propagates to kill the system. Moreover,
with so many atoms, how can you possible detect a funny alpha/gamma
artifact in DNA or an alpha-helix bias in the protein?

I have run a large set of proteasome simulations of 200K-500K atoms and
also DNA mini-circles at 200K atoms using sander. You can hot start these
at 300K from a loosely minimized solvated crude protein model straight
from LEaP and everything is peachy, i.e. it runs fine. In addition to
being stable, you cannot really run long enough to find artifact or see
problems. I think the model of these hero calculations is: stable one-off
simulation + cursory analysis + "benchmark" as first of kind ==
publication.

This is what Mike was stating and this is not what I think AMBER is about.

I think we need to think in terms of what AMBER's strengths are: validated
force fields, critical assessment of simulation methods and results, and
as exhaustive sampling/exploration as possible. Changing PMEMD or sander
to work on "one billion" atoms is not hard; it is only hard if you want
good performance and you are constrained by the machine (as Ross and Bob
mentioned). If you are doing a one-off simulation, performance does not
matter.

What I-- and I think "we"-- want is the fastest set of methods to
exhaustively explore systems in the range of 10-100K atoms today and
likely up to 1M or more atoms within 2 or so years. Given the machines
coming online, ideally what we want *now* is fast methods that work in
ensembles so I can do replica-exchange or path sampling or dG runs
optimally on the emerging machines, i.e. chained or loosely coupled
parallel jobs (multisander) that scale internally to up to 1K cores or
beyond. With multiple replicas, each at 1K cores, we will be able to use
the emerging machines very effectively.

In some sense, we are in a paradigm shift in the way we need to think
about the simulations (and their analysis). Running a billion atom
simulation on the whole machine is not as useful as running thousands of
smaller systems that can be fully explored. But the real paradigm shift
will be in how to handle all the data. If these new machines do offer
100x the power we've seen previously, how do we handle 100x the
results (aka trajectory size)?

--tom

p.s. regarding formats, my opinion in the short run (aka amber10) is use
what we have...
Received on Sun Dec 09 2007 - 06:07:10 PST