Re: amber-developers: NSF Petascale RFP - Some amuzing points. from Ken Merz on 2006-06-19 (Amber Developers Archive Jun 2006)

From: Ken Merz <merz.qtp.ufl.edu>
Date: Mon, 19 Jun 2006 04:02:12 -0400

Ross,
Do you have the link for the RFP? It should make for interesting
reading. Thank you! Kennie

On Jun 19, 2006, at 12:22 AM, Ross Walker wrote:

>
> Hi All,
>
> I thought some of you might find part of NSF's latest RFP for a
> $300 million
> petaflop machine to be somewhat amusing... As part of the RFP there
> is a
> section that lists 3 specific simulations that vendors must provide
> estimates for the performance of their proposed petaflop machines.
> This
> sections lists specific targets for each of the 3 simulations that
> the NSF
> expects the machines to be able to run. It says nothing about
> modifying the
> codes specifically for the new machines. Anyway, here is one I
> think you
> might all be interested in:
>
> "* A molecular dynamics (MD) simulation of curvature-inducing protein
> BAR domains binding to a charged phospholipid vesicle over 10 ns
> simulation time under periodic boundary conditions. The vesicle, 100
> nm in diameter, should consist of a mixture of
> dioleoylphosphatidylcholine (DOPC) and dioleoylphosphatidylserine
> (DOPS) at a ratio of 2:1. The entire system should consist of 100,000
> lipids and 1000 BAR domains solvated in 30 million water molecules,
> with NaCl also included at a concentration of 0.15 M, for a total
> system size of 100 million atoms. All system components should be
> modeled using the CHARMM27 all-atom empirical force field. The target
> wall-clock time for completion of the model problem using the NAMD MD
> package with the velocity Verlet time-stepping algorithm, Langevin
> dynamics temperature coupling, Nose-Hoover Langevin piston pressure
> control, the Particle Mesh Ewald algorithm with a tolerance of 1.0e-6
> for calculation of electrostatics, a short-range (van der Waals)
> cut-off of 12 Angstroms, and a time step of 0.002 ps, with 64-bit
> floating point (or similar) arithmetic, is 25 hours. The positions,
> velocities, and forces of all the atoms should be saved to disk every
> 500 timesteps."
>
> HHHmmm, interesting... Well I tried a kind of back of the envelope
> calculation for this. I setup a test simulation on a 408,000 atom
> system
> using as close as I could get to the specs of the calculation given
> above in
> PMEMD (which typically performs about 10 to 15% quicker than NAMD
> in my
> experience). Here is the input file I used:
>
> equilibration
> &cntrl
> nstlim=1000,dt=0.002,es_cutoff=8.0,
> vdw_cutoff=12.0,
> ntc=2, ntf=2, tol=0.000001,
> ntx=5, irest=1, ntpr=500,
> ntt=3, gamma_ln=2.0,
> ntb=2,ntp=1,taup=2.0,
> ntwr=0, ntwx=500, ntwv=-1, ioutfm=1
> /
> &ewald
> dsum_tol=0.000001
> /
>
> I ran this on a single cpu 1.7GHz power4 machine which has a peak flop
> rating of 6.8 GFlops. This calculation took 9602.61 seconds to run.
> So 10ns
> would take 48013050 seconds on 6.8GFlops. So assuming that we could
> achieve
> 100% scaling on any number of cpus to get this calculation done in
> 25 hours
> would require:
>
> 6.8*48013050/(25*3600) = 3.627 TFlops.
>
> Now PME scales as N ln N so we would expect a 100 million atom
> simulation to
> take approximately (N=100*10^6/408000) = 245.1 * ln 245.1 = 1350
> times more
> computation. Hence we would need 3.627 * 1350 = 4.9 Petaflops
> assuming 100%
> perfect scaling....
>
> Whoops!!! Anybody want to volunteer to write the code to do the
> specified
> calculation in 25 hours on a machine of only 1 peta flop... ;-)
>
> Then if you want more laughs you can look at the I/O. They want the
> full
> coordinates, velocities and forces (why the forces I don't know)
> written
> every 500 steps. So for 10 ns you would write a total of 10,000
> frames. Each
> frame will be 100*10^6*8*3 bytes long and we need 3 frames per step
> (C,V and
> F) = 6.7GB per frame = 65.5 Terabytes.
>
> This is not an inordinate amount but if we consider that with NAMD
> only the
> master thread writes files (I guess we will have to assume that a full
> distributed I/O implementation can be written) and we allow say a
> generous
> 5% of the calculation time for writing to disk (which considering
> we need
> about 400% scaling to hit our target as it is is probably over
> generous ;-)
> ) then we would have to write 65.5 terabytes in 1.25 hours of
> master cpu
> time. This equates to a bandwidth to disk from the master node only
> of 14.9
> GB/sec. Since each write would also require a mpi_reduce we would
> also need
> 14.9GB/sec of bandwidth on the backplane to the master...
>
> So who wants to volunteer the code to tackle this problem???
>
> Have fun...
> Ross
>

Professor Kenneth M. Merz, Jr.
Department of Chemistry
Quantum Theory Project
2328 New Physics Building
PO Box 118435
University of Florida
Gainesville, Florida 32611-8435

e-mail: merz.qtp.ufl.edu
http://www.qtp.ufl.edu/~merz

Phone: 352-392-6973
FAX: 352-392-8722
Cell: 814-360-0376
Received on Wed Jun 21 2006 - 06:07:09 PDT