Re: amber-developers: Fw: How many atoms? from Robert Duke on 2007-12-05 (Amber Developers Archive Dec 2007)

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 5 Dec 2007 09:17:28 -0500

Hi Adrian -
Sorry, guilt by association, since I know you and Carlos collaborate a bit
;-) I am such a curmudgeon I am not sure I believe in results from 100,000
atom simulations, let alone 1,000,000+, but I am trying to make sure we do
things that position us well in the MD community, politics and all. I think
that whatever we do, it has to be pmemd + sander + the leaps. It also
dawned on me, this has ptraj implications. Tom C., are you in on this with
us? We do have to be careful to not whack the formats, but that is probably
a virtue of the amoeba inpcrd format - flexibility. Sander and pmemd could
actually be changed to support this stuff with 0 impact on existing code;
unless you input a 1M+ atom inpcrd, you will never get out a 1M+ atom
restrt. But we do have to make sure Tom C. is onboard.
Best Regards - Bob

----- Original Message -----
From: "Adrian Roitberg" <roitberg.qtp.ufl.edu>
To: <amber-developers.scripps.edu>
Sent: Wednesday, December 05, 2007 8:58 AM
Subject: Re: amber-developers: Fw: How many atoms?

> Bob, just to make sure, I do not have a BG/L, and I do not want one (I
> think carlos did not want SUNY's either, but higher powers intervened).
>
> From a science point of view, I do not believe that > 1 M atoms
> simulations are worth a penny, and I am willing to defend that position
> with numbers. It makes for a nice movie, but NOTHING happens in ~10-20 ns
> for something that large.
>
> However, it is the future and we should plan for it.
>
> The question is if if changing to >1 M now (for amber 10) requires large
> changes in file formats, would we be ready ? I rather have file format as
> they are for 10, and have the > 1 M version as beta for us and try it out
> for a while. I would hate to release such a change and then change it
> again.
>
> I presume the same changes in sander are not going to be done for 10, so
> there will be a disconnect in capabilities/format b/w sander and pmemd
>
> just my 3 cents, I feel good today so my opinion costs more ;-)
>
> Cheers
>
>
> Robert Duke wrote:
>> Hi Ross,
>> Well, that is a lot of processors; all the eggs in two baskets, eh?
>> Okay, we'll plan on the minimum to be able to run 1-10M (maybe more), may
>> the user beware. Would whoever is responsible for xleap/gleap (sorry,
>> but I have been bad keeping track of that end of the amber wilderness)
>> please let me know what is currently supported or what you will support.
>> Ross, do you have bandwidth to hack capability into sander? While on the
>> one hand I think the amoeba inpcrd format is overkill, it does have the
>> virtue of solving all future potential problems, and as I said before, we
>> can currently read this stuff in an amoeba context. Thanks to all for
>> input again; thanks to Ross for actually having a clue about what the
>> funding agencies and supercomputer centers are doing (on the one hand I
>> like tracking the technology, but on the other hand I am not fond of the
>> politics). Carlos, Adrian, all you guys with that really big BG/L out in
>> NY state somewhere, just bear in mind that multi-
> million atom simulations on that machine may not be real smooth ;-) (but
> hopefully the work I am doing now will have you in a good position to
> utilize the beast for reasonable-sized systems).
>> Best Regards - Bob
>> ----- Original Message -----
>> From: Ross Walker To: amber-developers.scripps.edu Sent: Wednesday,
>> December 05, 2007 12:51 AM
>> Subject: RE: amber-developers: Fw: How many atoms?
>>
>>
>> Hi Bob,
>>
>> The key thing to remember here is that Blue Gene/L is old technology
>> and will largely be defunct in the timeframe of the Amber10 lifespan. Of
>> all the large scale machines that exist Blue Gene should be the very last
>> one that we target. The main advantage of Blue Gene right now is it
>> provides you easy access to a large number of processors to allow for
>> testing / debugging. However, I would not envisage anyone asking for time
>> on Blue Gene systems to do serious MD simulations with AMBER.
>>
>> Instead the two most relevant large scale machines for US academics in
>> the 2008 to 2010 timeframe will be Ranger at TACC and the Cray machine at
>> ORNL. Since ORNL has not announced what their architecture is actually
>> going to consist of the only metric known is Ranger. This will have
>> 62,976 cores and you can expect a large proportion of them to be idle at
>> any one time, at least in the first year of operation. Hence the
>> landscape is changing rapidly. This machine, I believe, will provide more
>> SUs than the sum of all previously allocated SUs in the history of NSF
>> supercomputing. Hence this should be the metric by which we measure
>> things by. This coupled with the ORNL machine will provide so much
>> computing time that almost every US academic who wishes to apply for time
>> will be able to get more SUs than they could hope to obtain by building
>> their own in house cluster.
>>
>> This machine will have 2GB per core of memory, 16 way nodes for 32GB of
>> memory per node. So given that the memory limitation will be 2GB per MPI
>> task at a worst case and 32GB at the best case (if you run 1 MPI thread
>> per node) - or just do 1 asynchronous I/O operation per node instead of
>> per MPI task, then what are the limitations based on this? Note that this
>> is 64 times more memory per node than Blue Gene. Without any special
>> modifications to code and arrays etc what is the maximum number of atoms
>> within this architecture, I suspect it is significantly more than the
>> paltry 256MB offered by Blue Gene.
>>
>> Bare in mind these nodes will have swap as well so will fail
>> significantly more gracefully than does Blue Gene. This is the
>> architecture we need to be aiming at in order to have the maximum impact
>> on the maximum number of users at large scale.
>>
>> On the longer time scale - for Amber 11 we should be aiming at the IBM
>> Power 7 Percs system that will be built at NCSA - but this will
>> ultimately need a much greater effort involving overhauling the entire MD
>> workflow - lets hope we get the Peta Apps grant so we can make a real
>> impact here.
>>
>> All the best
>> Ross
>> /\
>> \/
>> |\oss Walker
>>
>> | HPC Consultant and Staff Scientist |
>> | San Diego Supercomputer Center |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery, may
>> not be read every day, and should not be used for urgent or sensitive
>> issues. ----------------------------------------------------------------------------
>> From: owner-amber-developers.scripps.edu
>> [mailto:owner-amber-developers.scripps.edu] On Behalf Of Robert Duke
>> Sent: Tuesday, December 04, 2007 20:07
>> To: amber-developers.scripps.edu
>> Subject: Re: amber-developers: Fw: How many atoms?
>>
>>
>> Hi Ross et al :-)
>> Thanks to all who made comments. Ross pretty much understands where
>> I am coming from here I think (Ross, thanks for the current rundown on
>> nsf machine futures too; I probably have more indigestion over BG/L than
>> multicore, but I am indeed moderately ill that all these unbalanced
>> architectures are being foisted on us). Anyway, my 'expectations'
>> regarding memory problems have been set by a couple of recent events: 1)
>> getting whacked by memory limitations on BG/L for cellulose out around
>> 2048 processors (if my memory serves...), and 2) the nature of the work I
>> have recently been doing with i/o and really large scaling. All along, I
>> have been bothered by the potential of all sorts of data structures
>> dimensioned by natom to push us over the edge on memory, and the more
>> sophisticated the code gets, the more combinations of maps and lists I
>> use to make things fast (so that is 2 * natom) every time I do that, or
>> 1.[0-9] * natom if I get a bit more clever for some things.
> The map structures tend to not scale down with increasing processor count,
> so that has been a potential issue. The thing that really had me pulling
> my hair out was expanding async i/o buffer space requirements though. The
> larger the count of async i/o's you "post" for later completion (so you
> can go do other things), the more buffer space you need, and in some
> instances the amount of buffer space per communication event does not
> scale down as well as you might like as the processor count goes up. So
> at 2048 procs on BG/L running cellulose, this is what actually bites you.
> I think I may have gotten around the worst memory problem in the new
> scaling architecture today with minimal performance hit; I'll see over the
> next week or so. But running something big on BG/L would definitely
> require some careful work that I may not have time to complete.
>>
>> Okay, so it sounds like people would like 1M+ atoms, nuts on BG/L
>> implications, so we should head in that direction. The nasty downside is
>> that for any memory-limited architecture, we may be setting ourselves up
>> for some runtime failures where folks won't understand the failure (the
>> code actually does produce a nice error msg for any allocation failure,
>> but that will show up in the system stderr rather than mdout, and could
>> get missed, and could happen in mid-run as loadbalancing causes changes
>> in memory allocation). So we should discuss how we want to specify the
>> new format inpcrd. Does leap already handle Darden's amoeba inpcrd
>> format? Do folks want something simpler? The advantage to the amoeba
>> format is that both pmemd and sander can already read it; they both just
>> need to know to try for both amoeba and non-amoeba runs. Then they also
>> need to be able to recognize that they are running >999,999 atoms and
>> write the restrt in the new format. What is the status
> of xleap/gleap in terms of Darden's inpcrd format? Would it be easy to
> add the capability to output the new format inpcrd for all systems
> generated by xleap/gleap? I don't want to divert to work on this stuff in
> pmemd immediately, but if folks want to reach a consensus on sander and
> xleap/gleap, then I can wedge the capability into pmemd in a little while.
> Realistically speaking, I think if we expand to 100M -1 capability, we
> should be covered for the forseeable future, and that is what we have with
> the current 'new' prmtop; of course the new prmtop and new inpcrd actually
> allow going even higher by specifying a different format than i8. The
> current hard architectural limit is around 134M, caused by the size of
> the image identifier (27 bits; the high bits are reserved for other info
> in the pairlist - also fixable). Of course you better really have a 64
> bit machine and a bit more than 4 GB/core to handle this sort of stuff...
>>
>> Regards - Bob
>> ----- Original Message -----
>> From: Ross Walker To: amber-developers.scripps.edu Sent: Tuesday,
>> December 04, 2007 10:14 PM
>> Subject: RE: amber-developers: Fw: How many atoms?
>>
>>
>> My understanding from Bob's email, and Bob can correct me if I am
>> wrong here, is that it is a memory consideration. I.e. large systems
>> could use significant amounts of memory and it is the work in keeping the
>> memory footprint small that is complicated and time consuming.
>>
>> However, from what I can glean Bob may have expectations for memory
>> that are somewhat lower than what will actually be deployed, based on
>> experience with Blue Gene. My assertion would be that we try to support >
>> 999,999 atoms but in the short term not worry about the memory
>> requirements of such calculations. In this way the limiting factor
>> becomes the available memory per node and not the underlying file
>> formats. Since Blue Gene is the exception rather than the rule in HPC
>> systems I think the problem will be much less than Bob is anticipating.
>> It seems crazy to focus effort on optimizing for the lowest common
>> denominator especially when 99% of available SUs on NSF allocated
>> resources will shortly be non-blue gene type architectures.
>>
>> I am of course neglecting the myriad of complexities involved in
>> terms of performance as a function of memory usage etc but at least for
>> Amber 10 it would seem to make sense to aim at the types of machines that
>> will be generally available to NSF researchers over the next two years
>> and all of these will have between 1 to 2GB per core (4GB+ per core if
>> you leave cores idle on various nodes) and enough processors to make even
>> Bob run away screaming that the apocalypse is coming.
>>
>> Just my 2c.
>>
>> All the best
>> Ross
>> /\
>> \/
>> |\oss Walker
>>
>> | HPC Consultant and Staff Scientist |
>> | San Diego Supercomputer Center |
>> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
>> | http://www.rosswalker.co.uk | PGP Key available on request |
>>
>> Note: Electronic Mail is not secure, has no guarantee of delivery,
>> may not be read every day, and should not be used for urgent or sensitive
>> issues. ------------------------------------------------------------------------
>> From: owner-amber-developers.scripps.edu
>> [mailto:owner-amber-developers.scripps.edu] On Behalf Of Carlos
>> Simmerling
>> Sent: Tuesday, December 04, 2007 18:10
>> To: amber-developers.scripps.edu
>> Subject: Re: amber-developers: Fw: How many atoms?
>>
>>
>> it sounded like Bob thinks there there IS a cost to doing this.
>> My feeling is that if there was no cost, go for it, but if it
>> takes
>> away Bob's precious time that he could be using to get this stuff
>> up and working for smaller systems, then we should let him focus on the
>> sizes that people actually run rather than having
>> delays or overall slower code just to support things that none of
>> us actually simulate. Sure, it could be great PR, and yes, maybe focusing
>> on smaller systems isn't visionary enough, but I think there is a lot to
>> be gained by getting better code for more modest
>> systems that still have biological relevance, rather that us
>> wasting
>> Bob's time on code that none of us need (yet).
>> carlos
>>
>>
>>
>> On Dec 4, 2007 8:46 PM, Ken Merz <merz.qtp.ufl.edu> wrote:
>>
>> Hi, If it costs us nothing then why not scale PMEMD beyond
>> 999,999 atoms. Someone out there might want to do 1MM+ atom simulation
>> with the AMBER program suite! Kennie
>>
>>
>> On 4 Dec 2007, at 2:14 PM, Robert Duke wrote:
>>
>>
>> Hello folks!
>> I am working hard on high-scaling pmemd code, and in the
>> course of the work it became clear to me, due to large async i/o buffer
>> and other issues, that going to very high atom counts may require a bunch
>> of extra work, especially on certain platforms (BG/L in particular...).
>> I posed the question below to Dave Case; he suggested I bounce it off the
>> list, so here it is. The crux of the matter is how people feel about
>> having an MD capability in pmemd for systems bigger than 999,999 atoms in
>> the next release. Please respond to the dev list if you have strong
>> feelings in either direction.
>> Thanks much! - Bob
>>
>>
>> ----- Original Message ----- From: "Robert Duke"
>> <rduke.email.unc.edu>
>> To: "David A. Case" < case.scripps.edu>
>> Sent: Tuesday, December 04, 2007 8:45 AM
>> Subject: How many atoms?
>>
>>
>>
>>
>> Hi Dave,
>> Just thought I would pulse you about how strong the desire
>> is to go above 1,000,000 atom systems in the next release. I personally
>> see this as more an advertising issue than real science; it's hard to get
>> good statistics/good science on 100,000 atoms let alone 10,000,000 atoms.
>> However, we do have competition. So the prmtop is not an issue, but the
>> inpcrd format is, and one thing that could be done is to move to
>> supporting the same type of flexible format in the inpcrd as we do in the
>> new-style prmtop. Tom D. has an inpcrd format in amoeba that would
>> probably do the trick; I can easily read this in pmemd but not yet write
>> it (I actually have pulled the code out - left it in the amoeba version
>> of course, but can put it back in as needed). I ask the question now
>> because I am hitting size issues already on BG/L on something like
>> cellulose. Some of this I can fix; some of it really is more
>> appropriately fixed by running on 64 bit memory systems where ther
> e actually is a multi-GB physical memory. The problem is particularly bad
> with some new code I am developing, due to extensive async i/o and
> requirements for buffers that at least theoretically could be pretty big
> (up to natom possible; by spending a couple of days writing really
> complicated code I can actually handle this in small amounts of space with
> effectively no performance impact - but it is the sort of thing that will
> be touchy and require additional testing). Anyway, I do want to gauge
> the desire to move up past 999,999 atoms, and make the point that on
> something like BG/L, it would actually require a lot more work to be able
> to run multi-million atom problems (basically got to go back and look at
> all the allocations, make them dense rather than sparse by doing all
> indexing through lists, allow for adaptive minimal i/o buffers, etc.
> etc. - messy stuff, some of it sourcing from having to allocate lots of
> arrays dimensioned by natom).
>> Best Regards - Bob Professor Kenneth M. Merz, Jr.
>> Department of Chemistry
>> Quantum Theory Project
>> 2328 New Physics Building PO Box 118435
>> University of Florida
>> Gainesville, Florida 32611-8435
>>
>>
>> e-mail: merz.qtp.ufl.edu
>> http://www.qtp.ufl.edu/~merz Phone: 352-392-6973
>> FAX: 352-392-8722
>> Cell: 814-360-0376
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ===================================================================
>> Carlos L. Simmerling, Ph.D.
>> Associate Professor Phone: (631) 632-1336 Center
>> for Structural Biology Fax: (631) 632-1555
>> CMM Bldg, Room G80
>> Stony Brook University E-mail:
>> carlos.simmerling.gmail.com
>> Stony Brook, NY 11794-5115 Web:
>> http://comp.chem.sunysb.edu
>>
>> ===================================================================
>
> --
> Dr. Adrian E. Roitberg
> Associate Professor
> Quantum Theory Project and Department of Chemistry
>
> University of Florida PHONE 352 392-6972
> P.O. Box 118435 FAX 352 392-8722
> Gainesville, FL 32611-8435 Email adrian.qtp.ufl.edu
> ============================================================================
>
> To announce that there must be no criticism of the president,
> or that we are to stand by the president right or wrong,
> is not only unpatriotic and servile, but is morally treasonable
> to the American public."
> -- Theodore Roosevelt
>
Received on Sun Dec 09 2007 - 06:07:09 PST