RE: amber-developers: Fw: How many atoms?

From: Ross Walker <ross.rosswalker.co.uk>
Date: Tue, 4 Dec 2007 14:20:04 -0800

Hi Bob,

My opinion on this is that if the physical limitation on 999,999 atoms is
trivial to remove then yes we should remove it. I realize it is largely just
for stunt runs but people unfortunately do seem to put weight on keeping up
with the Jones's so if we can do it without crippling something else then we
should.

The issue of someone wanting to run 1 million atom plus simulations on a
Blue Gene I would not worry about. In fact I would not concentrate on Blue
Gene systems at all since NSF will not be buying any more and slowly the
Blue Gene/L's will begin to die out. Blue Gene/P exists and has more memory
per node but again I think it is an architecture that we should not be
considering as a valid option for MD simulations. For Replica exchange sure
but for regular MD we should concentrate on getting the most out of the
systems NSF and DOE will be buying. Of these the major ones consist, in no
particular order, of:

Ranger (TACC) - AMD Quad Socket (2.0GHz???), Quad Core Barcelona Chips, DDR
Infiniband Clos, (16 cores per node), 32GB per node, (2GB/core).

Abe (NCSA) - Intel, Dual Socket, Quad Core Clovertown (2.33GHz), 8 cpus per
node, 8GB per node (1GB/per core), SDR Infiniband Clos

ORNL NSF Machine - Cray XT4/XT5 - AMD Dual Socket?, Quad Core, 8 cpus per
node?, 2.4/3.0GHz???, CStar2 Torus / Baker???, 8GB per node (1GB/core???).

ORNL Jaguar - Cray XT3/XT4/XT5/Baker ???

NCSA Blue Waters (Track-1) - IBM Power 7 / Percs ???

(NCSA/SDSC) Teragrid IA64 clusters - Dual Socket Single Core Itanium 2 -
4GB/12GB per node (2GB/6GB per core), 1.5GHz, Myrinet.

SDSC DataStar - IBM Power 4, 8 way and 32 way nodes, Federation,
16GB/32GB/256GB per node (2GB/4GB/8GB per core).

PSC XT3 - AMD single socket dual core 2.6GHz AMD, 2GB per node, (1GB per
core), Cstar Torus.

TACC Lonestar - dual socket dual core Intel, 8GB per node (2GB per core),
2.66GHz, SDR Infiniband.

There are also other machines on the horizon that should be considered but
are under non-disclosure right now. I would avoid putting much effort into
focusing on things like Blue Gene or LANL Roadrunner (Cell) since the
payback is generally very small for a heroic amount of effort.

I think you can assume at least 1GB per core, with 2GB more likely
(especially since you can run most of these machines with half the cores
idle and get better performance) on all future machines that would likely be
allocated for US scientists running MD with Amber. In fact per node you
should have at least 8GB, preferably more. NSF would be mad to buy a machine
with less than this in the 2008 to 2010 time frame.

Almost all will consist of various multicore options of AMD/Intel chips with
some IBM Power 5/6/7 in the mix possibly. Itanium is likely to die out over
the next few years in terms of NSF's portfolio although it should not
necessarily be dismissed - especially with people having Altix boxes.

So in short I would not kill yourself trying to save memory just to get a
Blue Gene system to run at maybe half the speed one can get out of an
infiniband cluster, concentrate on a target of around max 2GB per core
(16GB+ per node).

All the best
Ross

/\
\/
|\oss Walker

| HPC Consultant and Staff Scientist |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

> -----Original Message-----
> From: owner-amber-developers.scripps.edu
> [mailto:owner-amber-developers.scripps.edu] On Behalf Of Robert Duke
> Sent: Tuesday, December 04, 2007 11:14
> To: amber-developers.scripps.edu
> Subject: amber-developers: Fw: How many atoms?
>
> Hello folks!
> I am working hard on high-scaling pmemd code, and in the
> course of the work
> it became clear to me, due to large async i/o buffer and
> other issues, that
> going to very high atom counts may require a bunch of extra
> work, especially
> on certain platforms (BG/L in particular...). I posed the
> question below to
> Dave Case; he suggested I bounce it off the list, so here it
> is. The crux
> of the matter is how people feel about having an MD
> capability in pmemd for
> systems bigger than 999,999 atoms in the next release.
> Please respond to
> the dev list if you have strong feelings in either direction.
> Thanks much! - Bob
>
> ----- Original Message -----
> From: "Robert Duke" <rduke.email.unc.edu>
> To: "David A. Case" <case.scripps.edu>
> Sent: Tuesday, December 04, 2007 8:45 AM
> Subject: How many atoms?
>
>
> > Hi Dave,
> > Just thought I would pulse you about how strong the desire
> is to go above
> > 1,000,000 atom systems in the next release. I personally
> see this as more
> > an advertising issue than real science; it's hard to get good
> > statistics/good science on 100,000 atoms let alone
> 10,000,000 atoms.
> > However, we do have competition. So the prmtop is not an
> issue, but the
> > inpcrd format is, and one thing that could be done is to move to
> > supporting the same type of flexible format in the inpcrd
> as we do in the
> > new-style prmtop. Tom D. has an inpcrd format in amoeba that would
> > probably do the trick; I can easily read this in pmemd but
> not yet write
> > it (I actually have pulled the code out - left it in the
> amoeba version
> > of course, but can put it back in as needed). I ask the
> question now
> > because I am hitting size issues already on BG/L on something like
> > cellulose. Some of this I can fix; some of it really is more
> > appropriately fixed by running on 64 bit memory systems where there
> > actually is a multi-GB physical memory. The problem is
> particularly bad
> > with some new code I am developing, due to extensive async i/o and
> > requirements for buffers that at least theoretically could
> be pretty big
> > (up to natom possible; by spending a couple of days writing really
> > complicated code I can actually handle this in small
> amounts of space with
> > effectively no performance impact - but it is the sort of
> thing that will
> > be touchy and require additional testing). Anyway, I do
> want to gauge the
> > desire to move up past 999,999 atoms, and make the point
> that on something
> > like BG/L, it would actually require a lot more work to be
> able to run
> > multi-million atom problems (basically got to go back and
> look at all the
> > allocations, make them dense rather than sparse by doing
> all indexing
> > through lists, allow for adaptive minimal i/o buffers, etc.
> etc. - messy
> > stuff, some of it sourcing from having to allocate lots of arrays
> > dimensioned by natom).
> > Best Regards - Bob
>
>
>
Received on Wed Dec 05 2007 - 06:07:32 PST
Custom Search