Re: amber-developers: Fw: How many atoms? from Robert Duke on 2007-12-05 (Amber Developers Archive Dec 2007)

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 5 Dec 2007 10:39:08 -0500

And slowly adding thoughts... Does anyone coordinate with the VMD folks
when we inflict something new on the world? Would other formats be better
on this account?
Best - Bob
----- Original Message -----
From: "Robert Duke" <rduke.email.unc.edu>
To: <amber-developers.scripps.edu>
Sent: Wednesday, December 05, 2007 9:17 AM
Subject: Re: amber-developers: Fw: How many atoms?

> Hi Adrian -
> Sorry, guilt by association, since I know you and Carlos collaborate a bit
> ;-) I am such a curmudgeon I am not sure I believe in results from
> 100,000 atom simulations, let alone 1,000,000+, but I am trying to make
> sure we do things that position us well in the MD community, politics and
> all. I think that whatever we do, it has to be pmemd + sander + the
> leaps. It also dawned on me, this has ptraj implications. Tom C., are
> you in on this with us? We do have to be careful to not whack the
> formats, but that is probably a virtue of the amoeba inpcrd format -
> flexibility. Sander and pmemd could actually be changed to support this
> stuff with 0 impact on existing code; unless you input a 1M+ atom inpcrd,
> you will never get out a 1M+ atom restrt. But we do have to make sure Tom
> C. is onboard.
> Best Regards - Bob
>
> ----- Original Message -----
> From: "Adrian Roitberg" <roitberg.qtp.ufl.edu>
> To: <amber-developers.scripps.edu>
> Sent: Wednesday, December 05, 2007 8:58 AM
> Subject: Re: amber-developers: Fw: How many atoms?
>
>
>> Bob, just to make sure, I do not have a BG/L, and I do not want one (I
>> think carlos did not want SUNY's either, but higher powers intervened).
>>
>> From a science point of view, I do not believe that > 1 M atoms
>> simulations are worth a penny, and I am willing to defend that position
>> with numbers. It makes for a nice movie, but NOTHING happens in ~10-20 ns
>> for something that large.
>>
>> However, it is the future and we should plan for it.
>>
>> The question is if if changing to >1 M now (for amber 10) requires large
>> changes in file formats, would we be ready ? I rather have file format as
>> they are for 10, and have the > 1 M version as beta for us and try it out
>> for a while. I would hate to release such a change and then change it
>> again.
>>
>> I presume the same changes in sander are not going to be done for 10, so
>> there will be a disconnect in capabilities/format b/w sander and pmemd
>>
>> just my 3 cents, I feel good today so my opinion costs more ;-)
>>
>> Cheers
>>
>>
>> Robert Duke wrote:
>>> Hi Ross,
>>> Well, that is a lot of processors; all the eggs in two baskets, eh?
>>> Okay, we'll plan on the minimum to be able to run 1-10M (maybe more),
>>> may the user beware. Would whoever is responsible for xleap/gleap
>>> (sorry, but I have been bad keeping track of that end of the amber
>>> wilderness) please let me know what is currently supported or what you
>>> will support. Ross, do you have bandwidth to hack capability into
>>> sander? While on the one hand I think the amoeba inpcrd format is
>>> overkill, it does have the virtue of solving all future potential
>>> problems, and as I said before, we can currently read this stuff in an
>>> amoeba context. Thanks to all for input again; thanks to Ross for
>>> actually having a clue about what the funding agencies and supercomputer
>>> centers are doing (on the one hand I like tracking the technology, but
>>> on the other hand I am not fond of the politics). Carlos, Adrian, all
>>> you guys with that really big BG/L out in NY state somewhere, just bear
>>> in mind that multi-
>> million atom simulations on that machine may not be real smooth ;-) (but
>> hopefully the work I am doing now will have you in a good position to
>> utilize the beast for reasonable-sized systems).
>>> Best Regards - Bob
>>> ----- Original Message -----
>>> From: Ross Walker To: amber-developers.scripps.edu Sent: Wednesday,
>>> December 05, 2007 12:51 AM
>>> Subject: RE: amber-developers: Fw: How many atoms?
>>>
>>>
>>> Hi Bob,
>>>
>>> The key thing to remember here is that Blue Gene/L is old technology
>>> and will largely be defunct in the timeframe of the Amber10 lifespan. Of
>>> all the large scale machines that exist Blue Gene should be the very
>>> last one that we target. The main advantage of Blue Gene right now is it
>>> provides you easy access to a large number of processors to allow for
>>> testing / debugging. However, I would not envisage anyone asking for
>>> time on Blue Gene systems to do serious MD simulations with AMBER.
>>>
>>> Instead the two most relevant large scale machines for US academics in
>>> the 2008 to 2010 timeframe will be Ranger at TACC and the Cray machine
>>> at ORNL. Since ORNL has not announced what their architecture is
>>> actually going to consist of the only metric known is Ranger. This will
>>> have 62,976 cores and you can expect a large proportion of them to be
>>> idle at any one time, at least in the first year of operation. Hence the
>>> landscape is changing rapidly. This machine, I believe, will provide
>>> more SUs than the sum of all previously allocated SUs in the history of
>>> NSF supercomputing. Hence this should be the metric by which we measure
>>> things by. This coupled with the ORNL machine will provide so much
>>> computing time that almost every US academic who wishes to apply for
>>> time will be able to get more SUs than they could hope to obtain by
>>> building their own in house cluster.
>>>
>>> This machine will have 2GB per core of memory, 16 way nodes for 32GB
>>> of memory per node. So given that the memory limitation will be 2GB per
>>> MPI task at a worst case and 32GB at the best case (if you run 1 MPI
>>> thread per node) - or just do 1 asynchronous I/O operation per node
>>> instead of per MPI task, then what are the limitations based on this?
>>> Note that this is 64 times more memory per node than Blue Gene. Without
>>> any special modifications to code and arrays etc what is the maximum
>>> number of atoms within this architecture, I suspect it is significantly
>>> more than the paltry 256MB offered by Blue Gene.
>>>
>>> Bare in mind these nodes will have swap as well so will fail
>>> significantly more gracefully than does Blue Gene. This is the
>>> architecture we need to be aiming at in order to have the maximum impact
>>> on the maximum number of users at large scale.
>>>
>>> On the longer time scale - for Amber 11 we should be aiming at the IBM
>>> Power 7 Percs system that will be built at NCSA - but this will
>>> ultimately need a much greater effort involving overhauling the entire
>>> MD workflow - lets hope we get the Peta Apps grant so we can make a real
>>> impact here.
>>>
>>> All the best
>>> Ross
>>> /\
>>> \/
>>> |\oss Walker
>>>
>>> | HPC Consultant and Staff Scientist |
>>> | San Diego Supercomputer Center |
>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>
>>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>>> not be read every day, and should not be used for urgent or sensitive
>>> issues. ----------------------------------------------------------------------------
>>> From: owner-amber-developers.scripps.edu
>>> [mailto:owner-amber-developers.scripps.edu] On Behalf Of Robert Duke
>>> Sent: Tuesday, December 04, 2007 20:07
>>> To: amber-developers.scripps.edu
>>> Subject: Re: amber-developers: Fw: How many atoms?
>>>
>>>
>>> Hi Ross et al :-)
>>> Thanks to all who made comments. Ross pretty much understands where
>>> I am coming from here I think (Ross, thanks for the current rundown on
>>> nsf machine futures too; I probably have more indigestion over BG/L than
>>> multicore, but I am indeed moderately ill that all these unbalanced
>>> architectures are being foisted on us). Anyway, my 'expectations'
>>> regarding memory problems have been set by a couple of recent events:
>>> 1) getting whacked by memory limitations on BG/L for cellulose out
>>> around 2048 processors (if my memory serves...), and 2) the nature of
>>> the work I have recently been doing with i/o and really large scaling.
>>> All along, I have been bothered by the potential of all sorts of data
>>> structures dimensioned by natom to push us over the edge on memory, and
>>> the more sophisticated the code gets, the more combinations of maps and
>>> lists I use to make things fast (so that is 2 * natom) every time I do
>>> that, or 1.[0-9] * natom if I get a bit more clever for some things.
>> The map structures tend to not scale down with increasing processor
>> count, so that has been a potential issue. The thing that really had me
>> pulling my hair out was expanding async i/o buffer space requirements
>> though. The larger the count of async i/o's you "post" for later
>> completion (so you can go do other things), the more buffer space you
>> need, and in some instances the amount of buffer space per communication
>> event does not scale down as well as you might like as the processor
>> count goes up. So at 2048 procs on BG/L running cellulose, this is what
>> actually bites you. I think I may have gotten around the worst memory
>> problem in the new scaling architecture today with minimal performance
>> hit; I'll see over the next week or so. But running something big on
>> BG/L would definitely require some careful work that I may not have time
>> to complete.
>>>
>>> Okay, so it sounds like people would like 1M+ atoms, nuts on BG/L
>>> implications, so we should head in that direction. The nasty downside
>>> is that for any memory-limited architecture, we may be setting ourselves
>>> up for some runtime failures where folks won't understand the failure
>>> (the code actually does produce a nice error msg for any allocation
>>> failure, but that will show up in the system stderr rather than mdout,
>>> and could get missed, and could happen in mid-run as loadbalancing
>>> causes changes in memory allocation). So we should discuss how we want
>>> to specify the new format inpcrd. Does leap already handle Darden's
>>> amoeba inpcrd format? Do folks want something simpler? The advantage
>>> to the amoeba format is that both pmemd and sander can already read it;
>>> they both just need to know to try for both amoeba and non-amoeba runs.
>>> Then they also need to be able to recognize that they are running
>>> >999,999 atoms and write the restrt in the new format. What is the
>>> status
>> of xleap/gleap in terms of Darden's inpcrd format? Would it be easy to
>> add the capability to output the new format inpcrd for all systems
>> generated by xleap/gleap? I don't want to divert to work on this stuff
>> in pmemd immediately, but if folks want to reach a consensus on sander
>> and xleap/gleap, then I can wedge the capability into pmemd in a little
>> while. Realistically speaking, I think if we expand to 100M -1
>> capability, we should be covered for the forseeable future, and that is
>> what we have with the current 'new' prmtop; of course the new prmtop and
>> new inpcrd actually allow going even higher by specifying a different
>> format than i8. The current hard architectural limit is around 134M,
>> caused by the size of the image identifier (27 bits; the high bits are
>> reserved for other info in the pairlist - also fixable). Of course you
>> better really have a 64 bit machine and a bit more than 4 GB/core to
>> handle this sort of stuff...
>>>
>>> Regards - Bob
>>> ----- Original Message -----
>>> From: Ross Walker To: amber-developers.scripps.edu Sent: Tuesday,
>>> December 04, 2007 10:14 PM
>>> Subject: RE: amber-developers: Fw: How many atoms?
>>>
>>>
>>> My understanding from Bob's email, and Bob can correct me if I am
>>> wrong here, is that it is a memory consideration. I.e. large systems
>>> could use significant amounts of memory and it is the work in keeping
>>> the memory footprint small that is complicated and time consuming.
>>>
>>> However, from what I can glean Bob may have expectations for
>>> memory that are somewhat lower than what will actually be deployed,
>>> based on experience with Blue Gene. My assertion would be that we try to
>>> support > 999,999 atoms but in the short term not worry about the memory
>>> requirements of such calculations. In this way the limiting factor
>>> becomes the available memory per node and not the underlying file
>>> formats. Since Blue Gene is the exception rather than the rule in HPC
>>> systems I think the problem will be much less than Bob is anticipating.
>>> It seems crazy to focus effort on optimizing for the lowest common
>>> denominator especially when 99% of available SUs on NSF allocated
>>> resources will shortly be non-blue gene type architectures.
>>>
>>> I am of course neglecting the myriad of complexities involved in
>>> terms of performance as a function of memory usage etc but at least for
>>> Amber 10 it would seem to make sense to aim at the types of machines
>>> that will be generally available to NSF researchers over the next two
>>> years and all of these will have between 1 to 2GB per core (4GB+ per
>>> core if you leave cores idle on various nodes) and enough processors to
>>> make even Bob run away screaming that the apocalypse is coming.
>>>
>>> Just my 2c.
>>>
>>> All the best
>>> Ross
>>> /\
>>> \/
>>> |\oss Walker
>>>
>>> | HPC Consultant and Staff Scientist |
>>> | San Diego Supercomputer Center |
>>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>>
>>> Note: Electronic Mail is not secure, has no guarantee of delivery,
>>> may not be read every day, and should not be used for urgent or
>>> sensitive
>>> issues. ------------------------------------------------------------------------
>>> From: owner-amber-developers.scripps.edu
>>> [mailto:owner-amber-developers.scripps.edu] On Behalf Of Carlos
>>> Simmerling
>>> Sent: Tuesday, December 04, 2007 18:10
>>> To: amber-developers.scripps.edu
>>> Subject: Re: amber-developers: Fw: How many atoms?
>>>
>>>
>>> it sounded like Bob thinks there there IS a cost to doing this.
>>> My feeling is that if there was no cost, go for it, but if it
>>> takes
>>> away Bob's precious time that he could be using to get this
>>> stuff up and working for smaller systems, then we should let him focus
>>> on the sizes that people actually run rather than having
>>> delays or overall slower code just to support things that none
>>> of us actually simulate. Sure, it could be great PR, and yes, maybe
>>> focusing on smaller systems isn't visionary enough, but I think there is
>>> a lot to be gained by getting better code for more modest
>>> systems that still have biological relevance, rather that us
>>> wasting
>>> Bob's time on code that none of us need (yet).
>>> carlos
>>>
>>>
>>>
>>> On Dec 4, 2007 8:46 PM, Ken Merz <merz.qtp.ufl.edu> wrote:
>>>
>>> Hi, If it costs us nothing then why not scale PMEMD beyond
>>> 999,999 atoms. Someone out there might want to do 1MM+ atom simulation
>>> with the AMBER program suite! Kennie
>>>
>>>
>>> On 4 Dec 2007, at 2:14 PM, Robert Duke wrote:
>>>
>>>
>>> Hello folks!
>>> I am working hard on high-scaling pmemd code, and in the
>>> course of the work it became clear to me, due to large async i/o buffer
>>> and other issues, that going to very high atom counts may require a
>>> bunch of extra work, especially on certain platforms (BG/L in
>>> particular...). I posed the question below to Dave Case; he suggested I
>>> bounce it off the list, so here it is. The crux of the matter is how
>>> people feel about having an MD capability in pmemd for systems bigger
>>> than 999,999 atoms in the next release. Please respond to the dev list
>>> if you have strong feelings in either direction.
>>> Thanks much! - Bob
>>>
>>>
>>> ----- Original Message ----- From: "Robert Duke"
>>> <rduke.email.unc.edu>
>>> To: "David A. Case" < case.scripps.edu>
>>> Sent: Tuesday, December 04, 2007 8:45 AM
>>> Subject: How many atoms?
>>>
>>>
>>>
>>>
>>> Hi Dave,
>>> Just thought I would pulse you about how strong the desire
>>> is to go above 1,000,000 atom systems in the next release. I
>>> personally see this as more an advertising issue than real science; it's
>>> hard to get good statistics/good science on 100,000 atoms let alone
>>> 10,000,000 atoms. However, we do have competition. So the prmtop is not
>>> an issue, but the inpcrd format is, and one thing that could be done is
>>> to move to supporting the same type of flexible format in the inpcrd as
>>> we do in the new-style prmtop. Tom D. has an inpcrd format in amoeba
>>> that would probably do the trick; I can easily read this in pmemd but
>>> not yet write it (I actually have pulled the code out - left it in the
>>> amoeba version of course, but can put it back in as needed). I ask the
>>> question now because I am hitting size issues already on BG/L on
>>> something like cellulose. Some of this I can fix; some of it really is
>>> more appropriately fixed by running on 64 bit memory systems where ther
>> e actually is a multi-GB physical memory. The problem is particularly
>> bad with some new code I am developing, due to extensive async i/o and
>> requirements for buffers that at least theoretically could be pretty big
>> (up to natom possible; by spending a couple of days writing really
>> complicated code I can actually handle this in small amounts of space
>> with effectively no performance impact - but it is the sort of thing that
>> will be touchy and require additional testing). Anyway, I do want to
>> gauge the desire to move up past 999,999 atoms, and make the point that
>> on something like BG/L, it would actually require a lot more work to be
>> able to run multi-million atom problems (basically got to go back and
>> look at all the allocations, make them dense rather than sparse by doing
>> all indexing through lists, allow for adaptive minimal i/o buffers, etc.
>> etc. - messy stuff, some of it sourcing from having to allocate lots of
>> arrays dimensioned by natom).
>>> Best Regards - Bob Professor Kenneth M. Merz, Jr.
>>> Department of Chemistry
>>> Quantum Theory Project
>>> 2328 New Physics Building PO Box 118435
>>> University of Florida
>>> Gainesville, Florida 32611-8435
>>>
>>>
>>> e-mail: merz.qtp.ufl.edu
>>> http://www.qtp.ufl.edu/~merz Phone: 352-392-6973
>>> FAX: 352-392-8722
>>> Cell: 814-360-0376
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ===================================================================
>>> Carlos L. Simmerling, Ph.D.
>>> Associate Professor Phone: (631) 632-1336 Center
>>> for Structural Biology Fax: (631) 632-1555
>>> CMM Bldg, Room G80
>>> Stony Brook University E-mail:
>>> carlos.simmerling.gmail.com
>>> Stony Brook, NY 11794-5115 Web:
>>> http://comp.chem.sunysb.edu
>>>
>>> ===================================================================
>>
>> --
>> Dr. Adrian E. Roitberg
>> Associate Professor
>> Quantum Theory Project and Department of Chemistry
>>
>> University of Florida PHONE 352 392-6972
>> P.O. Box 118435 FAX 352 392-8722
>> Gainesville, FL 32611-8435 Email adrian.qtp.ufl.edu
>> ============================================================================
>>
>> To announce that there must be no criticism of the president,
>> or that we are to stand by the president right or wrong,
>> is not only unpatriotic and servile, but is morally treasonable
>> to the American public."
>> -- Theodore Roosevelt
>>
>
>
>
Received on Sun Dec 09 2007 - 06:07:10 PST