Hi Bob,
thanks for the work, and even though Ross says that BG/L are dead, there are
some
of us that do have them and it would be great to use it well without giving
up too much
in order to support huge simulations that not many of us are doing yet.
Carlos
On Dec 5, 2007 8:39 AM, Robert Duke <rduke.email.unc.edu> wrote:
> Hi Ross,
> Well, that is a lot of processors; all the eggs in two baskets, eh? Okay,
> we'll plan on the minimum to be able to run 1-10M (maybe more), may the user
> beware. Would whoever is responsible for xleap/gleap (sorry, but I have
> been bad keeping track of that end of the amber wilderness) please let me
> know what is currently supported or what you will support. Ross, do you
> have bandwidth to hack capability into sander? While on the one hand I
> think the amoeba inpcrd format is overkill, it does have the virtue of
> solving all future potential problems, and as I said before, we can
> currently read this stuff in an amoeba context. Thanks to all for input
> again; thanks to Ross for actually having a clue about what the funding
> agencies and supercomputer centers are doing (on the one hand I like
> tracking the technology, but on the other hand I am not fond of the
> politics). Carlos, Adrian, all you guys with that really big BG/L out in NY
> state somewhere, just bear in mind that multi-million atom simulations on
> that machine may not be real smooth ;-) (but hopefully the work I am doing
> now will have you in a good position to utilize the beast for
> reasonable-sized systems).
> Best Regards - Bob
>
> ----- Original Message -----
> *From:* Ross Walker <ross.rosswalker.co.uk>
> *To:* amber-developers.scripps.edu
> *Sent:* Wednesday, December 05, 2007 12:51 AM
> *Subject:* RE: amber-developers: Fw: How many atoms?
>
> Hi Bob,
>
> The key thing to remember here is that Blue Gene/L is old technology and
> will largely be defunct in the timeframe of the Amber10 lifespan. Of all the
> large scale machines that exist Blue Gene should be the very last one that
> we target. The main advantage of Blue Gene right now is it provides you easy
> access to a large number of processors to allow for testing / debugging.
> However, I would not envisage anyone asking for time on Blue Gene systems to
> do serious MD simulations with AMBER.
>
> Instead the two most relevant large scale machines for US academics in the
> 2008 to 2010 timeframe will be Ranger at TACC and the Cray machine at ORNL.
> Since ORNL has not announced what their architecture is actually going to
> consist of the only metric known is Ranger. This will have 62,976 cores and
> you can expect a large proportion of them to be idle at any one time, at
> least in the first year of operation. Hence the landscape is changing
> rapidly. This machine, I believe, will provide more SUs than the sum of all
> previously allocated SUs in the history of NSF supercomputing. Hence this
> should be the metric by which we measure things by. This coupled with the
> ORNL machine will provide so much computing time that almost every US
> academic who wishes to apply for time will be able to get more SUs than they
> could hope to obtain by building their own in house cluster.
>
> This machine will have 2GB per core of memory, 16 way nodes for 32GB of
> memory per node. So given that the memory limitation will be 2GB per MPI
> task at a worst case and 32GB at the best case (if you run 1 MPI thread per
> node) - or just do 1 asynchronous I/O operation per node instead of per MPI
> task, then what are the limitations based on this? Note that this is 64
> times more memory per node than Blue Gene. Without any special modifications
> to code and arrays etc what is the maximum number of atoms within this
> architecture, I suspect it is significantly more than the paltry 256MB
> offered by Blue Gene.
>
> Bare in mind these nodes will have swap as well so will fail significantly
> more gracefully than does Blue Gene.
>
> This is the architecture we need to be aiming at in order to have the
> maximum impact on the maximum number of users at large scale.
>
> On the longer time scale - for Amber 11 we should be aiming at the IBM
> Power 7 Percs system that will be built at NCSA - but this will ultimately
> need a much greater effort involving overhauling the entire MD workflow -
> lets hope we get the Peta Apps grant so we can make a real impact here.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
> ------------------------------
> *From:* owner-amber-developers.scripps.edu [mailto:
> owner-amber-developers.scripps.edu] *On Behalf Of *Robert Duke
> *Sent:* Tuesday, December 04, 2007 20:07
> *To:* amber-developers.scripps.edu
> *Subject:* Re: amber-developers: Fw: How many atoms?
>
> Hi Ross et al :-)
> Thanks to all who made comments. Ross pretty much understands where I am
> coming from here I think (Ross, thanks for the current rundown on nsf
> machine futures too; I probably have more indigestion over BG/L than
> multicore, but I am indeed moderately ill that all these unbalanced
> architectures are being foisted on us). Anyway, my 'expectations' regarding
> memory problems have been set by a couple of recent events: 1) getting
> whacked by memory limitations on BG/L for cellulose out around 2048
> processors (if my memory serves...), and 2) the nature of the work I have
> recently been doing with i/o and really large scaling. All along, I have
> been bothered by the potential of all sorts of data structures dimensioned
> by natom to push us over the edge on memory, and the more sophisticated the
> code gets, the more combinations of maps and lists I use to make things fast
> (so that is 2 * natom) every time I do that, or 1.[0-9] * natom if I get a
> bit more clever for some things. The map structures tend to not scale down
> with increasing processor count, so that has been a potential issue. The
> thing that really had me pulling my hair out was expanding async i/o buffer
> space requirements though. The larger the count of async i/o's you "post"
> for later completion (so you can go do other things), the more buffer space
> you need, and in some instances the amount of buffer space per communication
> event does not scale down as well as you might like as the processor count
> goes up. So at 2048 procs on BG/L running cellulose, this is what actually
> bites you. I think I may have gotten around the worst memory problem in the
> new scaling architecture today with minimal performance hit; I'll see over
> the next week or so. But running something big on BG/L would definitely
> require some careful work that I may not have time to complete.
>
> Okay, so it sounds like people would like 1M+ atoms, nuts on BG/L
> implications, so we should head in that direction. The nasty downside is
> that for any memory-limited architecture, we may be setting ourselves up for
> some runtime failures where folks won't understand the failure (the code
> actually does produce a nice error msg for any allocation failure, but that
> will show up in the system stderr rather than mdout, and could get missed,
> and could happen in mid-run as loadbalancing causes changes in memory
> allocation). So we should discuss how we want to specify the new format
> inpcrd. Does leap already handle Darden's amoeba inpcrd format? Do folks
> want something simpler? The advantage to the amoeba format is that both
> pmemd and sander can already read it; they both just need to know to try for
> both amoeba and non-amoeba runs. Then they also need to be able to
> recognize that they are running >999,999 atoms and write the restrt in the
> new format. What is the status of xleap/gleap in terms of Darden's inpcrd
> format? Would it be easy to add the capability to output the new format
> inpcrd for all systems generated by xleap/gleap? I don't want to divert to
> work on this stuff in pmemd immediately, but if folks want to reach a
> consensus on sander and xleap/gleap, then I can wedge the capability into
> pmemd in a little while. Realistically speaking, I think if we expand to
> 100M -1 capability, we should be covered for the forseeable future, and that
> is what we have with the current 'new' prmtop; of course the new prmtop and
> new inpcrd actually allow going even higher by specifying a different format
> than i8. The current hard architectural limit is around 134M, caused by
> the size of the image identifier (27 bits; the high bits are reserved for
> other info in the pairlist - also fixable). Of course you better really
> have a 64 bit machine and a bit more than 4 GB/core to handle this sort of
> stuff...
>
> Regards - Bob
>
> ----- Original Message -----
> *From:* Ross Walker <ross.rosswalker.co.uk>
> *To:* amber-developers.scripps.edu
> *Sent:* Tuesday, December 04, 2007 10:14 PM
> *Subject:* RE: amber-developers: Fw: How many atoms?
>
> My understanding from Bob's email, and Bob can correct me if I am wrong
> here, is that it is a memory consideration. I.e. large systems could use
> significant amounts of memory and it is the work in keeping the memory
> footprint small that is complicated and time consuming.
>
> However, from what I can glean Bob may have expectations for memory that
> are somewhat lower than what will actually be deployed, based on experience
> with Blue Gene. My assertion would be that we try to support > 999,999 atoms
> but in the short term not worry about the memory requirements of such
> calculations. In this way the limiting factor becomes the available memory
> per node and not the underlying file formats. Since Blue Gene is the
> exception rather than the rule in HPC systems I think the problem will be
> much less than Bob is anticipating. It seems crazy to focus effort on
> optimizing for the lowest common denominator especially when 99% of
> available SUs on NSF allocated resources will shortly be non-blue gene type
> architectures.
>
> I am of course neglecting the myriad of complexities involved in terms of
> performance as a function of memory usage etc but at least for Amber 10 it
> would seem to make sense to aim at the types of machines that will be
> generally available to NSF researchers over the next two years and all of
> these will have between 1 to 2GB per core (4GB+ per core if you leave cores
> idle on various nodes) and enough processors to make even Bob run away
> screaming that the apocalypse is coming.
>
> Just my 2c.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
> ------------------------------
> *From:* owner-amber-developers.scripps.edu [mailto:
> owner-amber-developers.scripps.edu] *On Behalf Of *Carlos Simmerling
> *Sent:* Tuesday, December 04, 2007 18:10
> *To:* amber-developers.scripps.edu
> *Subject:* Re: amber-developers: Fw: How many atoms?
>
> it sounded like Bob thinks there there IS a cost to doing this.
> My feeling is that if there was no cost, go for it, but if it takes
> away Bob's precious time that he could be using to get this
> stuff up and working for smaller systems, then we should let
> him focus on the sizes that people actually run rather than having
> delays or overall slower code just to support things that none of us
> actually simulate. Sure, it could be great PR, and yes, maybe
> focusing on smaller systems isn't visionary enough, but I think
> there is a lot to be gained by getting better code for more modest
> systems that still have biological relevance, rather that us wasting
> Bob's time on code that none of us need (yet).
> carlos
>
>
> On Dec 4, 2007 8:46 PM, Ken Merz <merz.qtp.ufl.edu> wrote:
>
> > Hi, If it costs us nothing then why not scale PMEMD beyond 999,999
> > atoms. Someone out there might want to do 1MM+ atom simulation with the
> > AMBER program suite! Kennie
> >
> > On 4 Dec 2007, at 2:14 PM, Robert Duke wrote:
> >
> > Hello folks!
> > I am working hard on high-scaling pmemd code, and in the course of the
> > work it became clear to me, due to large async i/o buffer and other issues,
> > that going to very high atom counts may require a bunch of extra work,
> > especially on certain platforms (BG/L in particular...). I posed the
> > question below to Dave Case; he suggested I bounce it off the list, so here
> > it is. The crux of the matter is how people feel about having an MD
> > capability in pmemd for systems bigger than 999,999 atoms in the next
> > release. Please respond to the dev list if you have strong feelings in
> > either direction.
> > Thanks much! - Bob
> >
> > ----- Original Message ----- From: "Robert Duke" <rduke.email.unc.edu>
> > To: "David A. Case" < case.scripps.edu>
> > Sent: Tuesday, December 04, 2007 8:45 AM
> > Subject: How many atoms?
> >
> >
> > Hi Dave,
> > Just thought I would pulse you about how strong the desire is to go
> > above 1,000,000 atom systems in the next release. I personally see
> > this as more an advertising issue than real science; it's hard to get good
> > statistics/good science on 100,000 atoms let alone 10,000,000 atoms.
> > However, we do have competition. So the prmtop is not an issue, but the
> > inpcrd format is, and one thing that could be done is to move to supporting
> > the same type of flexible format in the inpcrd as we do in the new-style
> > prmtop. Tom D. has an inpcrd format in amoeba that would probably do
> > the trick; I can easily read this in pmemd but not yet write it (I actually
> > have pulled the code out - left it in the amoeba version of course,
> > but can put it back in as needed). I ask the question now because I am
> > hitting size issues already on BG/L on something like cellulose. Some
> > of this I can fix; some of it really is more appropriately fixed by running
> > on 64 bit memory systems where there actually is a multi-GB physical memory.
> > The problem is particularly bad with some new code I am developing, due
> > to extensive async i/o and requirements for buffers that at least
> > theoretically could be pretty big (up to natom possible; by spending a
> > couple of days writing really complicated code I can actually handle this in
> > small amounts of space with effectively no performance impact - but it is
> > the sort of thing that will be touchy and require additional testing).
> > Anyway, I do want to gauge the desire to move up past 999,999 atoms, and
> > make the point that on something like BG/L, it would actually require a lot
> > more work to be able to run multi-million atom problems (basically got to go
> > back and look at all the allocations, make them dense rather than sparse by
> > doing all indexing through lists, allow for adaptive minimal i/o buffers,
> > etc. etc. - messy stuff, some of it sourcing from having to allocate lots of
> > arrays dimensioned by natom).
> > Best Regards - Bob
> >
> >
> >
> > Professor Kenneth M. Merz, Jr.
> > Department of Chemistry
> > Quantum Theory Project
> > 2328 New Physics Building
> > PO Box 118435
> > University of Florida
> > Gainesville, Florida 32611-8435
> >
> > e-mail: merz.qtp.ufl.edu
> > http://www.qtp.ufl.edu/~merz <http://www.qtp.ufl.edu/%7Emerz>
> >
> > Phone: 352-392-6973
> > FAX: 352-392-8722
> > Cell: 814-360-0376
> >
> >
> >
> >
> >
>
>
> --
> ===================================================================
> Carlos L. Simmerling, Ph.D.
> Associate Professor Phone: (631) 632-1336
> Center for Structural Biology Fax: (631) 632-1555
> CMM Bldg, Room G80
> Stony Brook University E-mail: carlos.simmerling.gmail.com
> Stony Brook, NY 11794-5115 Web: http://comp.chem.sunysb.edu
> ===================================================================
>
>
--
===================================================================
Carlos L. Simmerling, Ph.D.
Associate Professor Phone: (631) 632-1336
Center for Structural Biology Fax: (631) 632-1555
CMM Bldg, Room G80
Stony Brook University E-mail: carlos.simmerling.gmail.com
Stony Brook, NY 11794-5115 Web: http://comp.chem.sunysb.edu
===================================================================
Received on Sun Dec 09 2007 - 06:07:08 PST