Re: amber-developers: Fw: How many atoms?

From: Robert Duke <rduke.email.unc.edu>
Date: Tue, 4 Dec 2007 23:06:33 -0500

Hi Ross et al :-)
Thanks to all who made comments. Ross pretty much understands where I am coming from here I think (Ross, thanks for the current rundown on nsf machine futures too; I probably have more indigestion over BG/L than multicore, but I am indeed moderately ill that all these unbalanced architectures are being foisted on us). Anyway, my 'expectations' regarding memory problems have been set by a couple of recent events: 1) getting whacked by memory limitations on BG/L for cellulose out around 2048 processors (if my memory serves...), and 2) the nature of the work I have recently been doing with i/o and really large scaling. All along, I have been bothered by the potential of all sorts of data structures dimensioned by natom to push us over the edge on memory, and the more sophisticated the code gets, the more combinations of maps and lists I use to make things fast (so that is 2 * natom) every time I do that, or 1.[0-9] * natom if I get a bit more clever for some things. The map structures tend to not scale down with increasing processor count, so that has been a potential issue. The thing that really had me pulling my hair out was expanding async i/o buffer space requirements though. The larger the count of async i/o's you "post" for later completion (so you can go do other things), the more buffer space you need, and in some instances the amount of buffer space per communication event does not scale down as well as you might like as the processor count goes up. So at 2048 procs on BG/L running cellulose, this is what actually bites you. I think I may have gotten around the worst memory problem in the new scaling architecture today with minimal performance hit; I'll see over the next week or so. But running something big on BG/L would definitely require some careful work that I may not have time to complete.

Okay, so it sounds like people would like 1M+ atoms, nuts on BG/L implications, so we should head in that direction. The nasty downside is that for any memory-limited architecture, we may be setting ourselves up for some runtime failures where folks won't understand the failure (the code actually does produce a nice error msg for any allocation failure, but that will show up in the system stderr rather than mdout, and could get missed, and could happen in mid-run as loadbalancing causes changes in memory allocation). So we should discuss how we want to specify the new format inpcrd. Does leap already handle Darden's amoeba inpcrd format? Do folks want something simpler? The advantage to the amoeba format is that both pmemd and sander can already read it; they both just need to know to try for both amoeba and non-amoeba runs. Then they also need to be able to recognize that they are running >999,999 atoms and write the restrt in the new format. What is the status of xleap/gleap in terms of Darden's inpcrd format? Would it be easy to add the capability to output the new format inpcrd for all systems generated by xleap/gleap? I don't want to divert to work on this stuff in pmemd immediately, but if folks want to reach a consensus on sander and xleap/gleap, then I can wedge the capability into pmemd in a little while. Realistically speaking, I think if we expand to 100M -1 capability, we should be covered for the forseeable future, and that is what we have with the current 'new' prmtop; of course the new prmtop and new inpcrd actually allow going even higher by specifying a different format than i8. The current hard architectural limit is around 134M, caused by the size of the image identifier (27 bits; the high bits are reserved for other info in the pairlist - also fixable). Of course you better really have a 64 bit machine and a bit more than 4 GB/core to handle this sort of stuff...

Regards - Bob
  ----- Original Message -----
  From: Ross Walker
  To: amber-developers.scripps.edu
  Sent: Tuesday, December 04, 2007 10:14 PM
  Subject: RE: amber-developers: Fw: How many atoms?


  My understanding from Bob's email, and Bob can correct me if I am wrong here, is that it is a memory consideration. I.e. large systems could use significant amounts of memory and it is the work in keeping the memory footprint small that is complicated and time consuming.

  However, from what I can glean Bob may have expectations for memory that are somewhat lower than what will actually be deployed, based on experience with Blue Gene. My assertion would be that we try to support > 999,999 atoms but in the short term not worry about the memory requirements of such calculations. In this way the limiting factor becomes the available memory per node and not the underlying file formats. Since Blue Gene is the exception rather than the rule in HPC systems I think the problem will be much less than Bob is anticipating. It seems crazy to focus effort on optimizing for the lowest common denominator especially when 99% of available SUs on NSF allocated resources will shortly be non-blue gene type architectures.

  I am of course neglecting the myriad of complexities involved in terms of performance as a function of memory usage etc but at least for Amber 10 it would seem to make sense to aim at the types of machines that will be generally available to NSF researchers over the next two years and all of these will have between 1 to 2GB per core (4GB+ per core if you leave cores idle on various nodes) and enough processors to make even Bob run away screaming that the apocalypse is coming.

  Just my 2c.

  All the best
  Ross
  /\
  \/
  |\oss Walker

  | HPC Consultant and Staff Scientist |
  | San Diego Supercomputer Center |
  | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
  | http://www.rosswalker.co.uk | PGP Key available on request |

  Note: Electronic Mail is not secure, has no guarantee of delivery, may not be read every day, and should not be used for urgent or sensitive issues.





----------------------------------------------------------------------------
    From: owner-amber-developers.scripps.edu [mailto:owner-amber-developers.scripps.edu] On Behalf Of Carlos Simmerling
    Sent: Tuesday, December 04, 2007 18:10
    To: amber-developers.scripps.edu
    Subject: Re: amber-developers: Fw: How many atoms?


    it sounded like Bob thinks there there IS a cost to doing this.
    My feeling is that if there was no cost, go for it, but if it takes
    away Bob's precious time that he could be using to get this
    stuff up and working for smaller systems, then we should let
    him focus on the sizes that people actually run rather than having
    delays or overall slower code just to support things that none of us
    actually simulate. Sure, it could be great PR, and yes, maybe
    focusing on smaller systems isn't visionary enough, but I think
    there is a lot to be gained by getting better code for more modest
    systems that still have biological relevance, rather that us wasting
    Bob's time on code that none of us need (yet).
    carlos



    On Dec 4, 2007 8:46 PM, Ken Merz <merz.qtp.ufl.edu> wrote:

      Hi,
       If it costs us nothing then why not scale PMEMD beyond 999,999 atoms. Someone out there might want to do 1MM+ atom simulation with the AMBER program suite! Kennie


      On 4 Dec 2007, at 2:14 PM, Robert Duke wrote:


        Hello folks!
        I am working hard on high-scaling pmemd code, and in the course of the work it became clear to me, due to large async i/o buffer and other issues, that going to very high atom counts may require a bunch of extra work, especially on certain platforms (BG/L in particular...). I posed the question below to Dave Case; he suggested I bounce it off the list, so here it is. The crux of the matter is how people feel about having an MD capability in pmemd for systems bigger than 999,999 atoms in the next release. Please respond to the dev list if you have strong feelings in either direction.
        Thanks much! - Bob


        ----- Original Message ----- From: "Robert Duke" <rduke.email.unc.edu>
        To: "David A. Case" < case.scripps.edu>
        Sent: Tuesday, December 04, 2007 8:45 AM
        Subject: How many atoms?




          Hi Dave,
          Just thought I would pulse you about how strong the desire is to go above 1,000,000 atom systems in the next release. I personally see this as more an advertising issue than real science; it's hard to get good statistics/good science on 100,000 atoms let alone 10,000,000 atoms. However, we do have competition. So the prmtop is not an issue, but the inpcrd format is, and one thing that could be done is to move to supporting the same type of flexible format in the inpcrd as we do in the new-style prmtop. Tom D. has an inpcrd format in amoeba that would probably do the trick; I can easily read this in pmemd but not yet write it (I actually have pulled the code out - left it in the amoeba version of course, but can put it back in as needed). I ask the question now because I am hitting size issues already on BG/L on something like cellulose. Some of this I can fix; some of it really is more appropriately fixed by running on 64 bit memory systems where there actually is a multi-GB physical memory. The problem is particularly bad with some new code I am developing, due to extensive async i/o and requirements for buffers that at least theoretically could be pretty big (up to natom possible; by spending a couple of days writing really complicated code I can actually handle this in small amounts of space with effectively no performance impact - but it is the sort of thing that will be touchy and require additional testing). Anyway, I do want to gauge the desire to move up past 999,999 atoms, and make the point that on something like BG/L, it would actually require a lot more work to be able to run multi-million atom problems (basically got to go back and look at all the allocations, make them dense rather than sparse by doing all indexing through lists, allow for adaptive minimal i/o buffers, etc. etc. - messy stuff, some of it sourcing from having to allocate lots of arrays dimensioned by natom).
          Best Regards - Bob




      Professor Kenneth M. Merz, Jr.
      Department of Chemistry
      Quantum Theory Project
      2328 New Physics Building
      PO Box 118435
      University of Florida
      Gainesville, Florida 32611-8435


      e-mail: merz.qtp.ufl.edu
      http://www.qtp.ufl.edu/~merz


      Phone: 352-392-6973
      FAX: 352-392-8722
      Cell: 814-360-0376











    --
    ===================================================================
    Carlos L. Simmerling, Ph.D.
    Associate Professor Phone: (631) 632-1336
    Center for Structural Biology Fax: (631) 632-1555
    CMM Bldg, Room G80
    Stony Brook University E-mail: carlos.simmerling.gmail.com
    Stony Brook, NY 11794-5115 Web: http://comp.chem.sunysb.edu
    ===================================================================
Received on Wed Dec 05 2007 - 06:07:38 PST
Custom Search