- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Ross Walker <ross.rosswalker.co.uk>

Date: Sun, 18 Jun 2006 21:22:37 -0700

Hi All,

I thought some of you might find part of NSF's latest RFP for a $300 million

petaflop machine to be somewhat amusing... As part of the RFP there is a

section that lists 3 specific simulations that vendors must provide

estimates for the performance of their proposed petaflop machines. This

sections lists specific targets for each of the 3 simulations that the NSF

expects the machines to be able to run. It says nothing about modifying the

codes specifically for the new machines. Anyway, here is one I think you

might all be interested in:

"* A molecular dynamics (MD) simulation of curvature-inducing protein

BAR domains binding to a charged phospholipid vesicle over 10 ns

simulation time under periodic boundary conditions. The vesicle, 100

nm in diameter, should consist of a mixture of

dioleoylphosphatidylcholine (DOPC) and dioleoylphosphatidylserine

(DOPS) at a ratio of 2:1. The entire system should consist of 100,000

lipids and 1000 BAR domains solvated in 30 million water molecules,

with NaCl also included at a concentration of 0.15 M, for a total

system size of 100 million atoms. All system components should be

modeled using the CHARMM27 all-atom empirical force field. The target

wall-clock time for completion of the model problem using the NAMD MD

package with the velocity Verlet time-stepping algorithm, Langevin

dynamics temperature coupling, Nose-Hoover Langevin piston pressure

control, the Particle Mesh Ewald algorithm with a tolerance of 1.0e-6

for calculation of electrostatics, a short-range (van der Waals)

cut-off of 12 Angstroms, and a time step of 0.002 ps, with 64-bit

floating point (or similar) arithmetic, is 25 hours. The positions,

velocities, and forces of all the atoms should be saved to disk every

500 timesteps."

HHHmmm, interesting... Well I tried a kind of back of the envelope

calculation for this. I setup a test simulation on a 408,000 atom system

using as close as I could get to the specs of the calculation given above in

PMEMD (which typically performs about 10 to 15% quicker than NAMD in my

experience). Here is the input file I used:

equilibration

&cntrl

nstlim=1000,dt=0.002,es_cutoff=8.0,

vdw_cutoff=12.0,

ntc=2, ntf=2, tol=0.000001,

ntx=5, irest=1, ntpr=500,

ntt=3, gamma_ln=2.0,

ntb=2,ntp=1,taup=2.0,

ntwr=0, ntwx=500, ntwv=-1, ioutfm=1

/

&ewald

dsum_tol=0.000001

/

I ran this on a single cpu 1.7GHz power4 machine which has a peak flop

rating of 6.8 GFlops. This calculation took 9602.61 seconds to run. So 10ns

would take 48013050 seconds on 6.8GFlops. So assuming that we could achieve

100% scaling on any number of cpus to get this calculation done in 25 hours

would require:

6.8*48013050/(25*3600) = 3.627 TFlops.

Now PME scales as N ln N so we would expect a 100 million atom simulation to

take approximately (N=100*10^6/408000) = 245.1 * ln 245.1 = 1350 times more

computation. Hence we would need 3.627 * 1350 = 4.9 Petaflops assuming 100%

perfect scaling....

Whoops!!! Anybody want to volunteer to write the code to do the specified

calculation in 25 hours on a machine of only 1 peta flop... ;-)

Then if you want more laughs you can look at the I/O. They want the full

coordinates, velocities and forces (why the forces I don't know) written

every 500 steps. So for 10 ns you would write a total of 10,000 frames. Each

frame will be 100*10^6*8*3 bytes long and we need 3 frames per step (C,V and

F) = 6.7GB per frame = 65.5 Terabytes.

This is not an inordinate amount but if we consider that with NAMD only the

master thread writes files (I guess we will have to assume that a full

distributed I/O implementation can be written) and we allow say a generous

5% of the calculation time for writing to disk (which considering we need

about 400% scaling to hit our target as it is is probably over generous ;-)

) then we would have to write 65.5 terabytes in 1.25 hours of master cpu

time. This equates to a bandwidth to disk from the master node only of 14.9

GB/sec. Since each write would also require a mpi_reduce we would also need

14.9GB/sec of bandwidth on the backplane to the master...

So who wants to volunteer the code to tackle this problem???

Have fun...

Ross

Received on Wed Jun 21 2006 - 06:07:08 PDT

Date: Sun, 18 Jun 2006 21:22:37 -0700

Hi All,

I thought some of you might find part of NSF's latest RFP for a $300 million

petaflop machine to be somewhat amusing... As part of the RFP there is a

section that lists 3 specific simulations that vendors must provide

estimates for the performance of their proposed petaflop machines. This

sections lists specific targets for each of the 3 simulations that the NSF

expects the machines to be able to run. It says nothing about modifying the

codes specifically for the new machines. Anyway, here is one I think you

might all be interested in:

"* A molecular dynamics (MD) simulation of curvature-inducing protein

BAR domains binding to a charged phospholipid vesicle over 10 ns

simulation time under periodic boundary conditions. The vesicle, 100

nm in diameter, should consist of a mixture of

dioleoylphosphatidylcholine (DOPC) and dioleoylphosphatidylserine

(DOPS) at a ratio of 2:1. The entire system should consist of 100,000

lipids and 1000 BAR domains solvated in 30 million water molecules,

with NaCl also included at a concentration of 0.15 M, for a total

system size of 100 million atoms. All system components should be

modeled using the CHARMM27 all-atom empirical force field. The target

wall-clock time for completion of the model problem using the NAMD MD

package with the velocity Verlet time-stepping algorithm, Langevin

dynamics temperature coupling, Nose-Hoover Langevin piston pressure

control, the Particle Mesh Ewald algorithm with a tolerance of 1.0e-6

for calculation of electrostatics, a short-range (van der Waals)

cut-off of 12 Angstroms, and a time step of 0.002 ps, with 64-bit

floating point (or similar) arithmetic, is 25 hours. The positions,

velocities, and forces of all the atoms should be saved to disk every

500 timesteps."

HHHmmm, interesting... Well I tried a kind of back of the envelope

calculation for this. I setup a test simulation on a 408,000 atom system

using as close as I could get to the specs of the calculation given above in

PMEMD (which typically performs about 10 to 15% quicker than NAMD in my

experience). Here is the input file I used:

equilibration

&cntrl

nstlim=1000,dt=0.002,es_cutoff=8.0,

vdw_cutoff=12.0,

ntc=2, ntf=2, tol=0.000001,

ntx=5, irest=1, ntpr=500,

ntt=3, gamma_ln=2.0,

ntb=2,ntp=1,taup=2.0,

ntwr=0, ntwx=500, ntwv=-1, ioutfm=1

/

&ewald

dsum_tol=0.000001

/

I ran this on a single cpu 1.7GHz power4 machine which has a peak flop

rating of 6.8 GFlops. This calculation took 9602.61 seconds to run. So 10ns

would take 48013050 seconds on 6.8GFlops. So assuming that we could achieve

100% scaling on any number of cpus to get this calculation done in 25 hours

would require:

6.8*48013050/(25*3600) = 3.627 TFlops.

Now PME scales as N ln N so we would expect a 100 million atom simulation to

take approximately (N=100*10^6/408000) = 245.1 * ln 245.1 = 1350 times more

computation. Hence we would need 3.627 * 1350 = 4.9 Petaflops assuming 100%

perfect scaling....

Whoops!!! Anybody want to volunteer to write the code to do the specified

calculation in 25 hours on a machine of only 1 peta flop... ;-)

Then if you want more laughs you can look at the I/O. They want the full

coordinates, velocities and forces (why the forces I don't know) written

every 500 steps. So for 10 ns you would write a total of 10,000 frames. Each

frame will be 100*10^6*8*3 bytes long and we need 3 frames per step (C,V and

F) = 6.7GB per frame = 65.5 Terabytes.

This is not an inordinate amount but if we consider that with NAMD only the

master thread writes files (I guess we will have to assume that a full

distributed I/O implementation can be written) and we allow say a generous

5% of the calculation time for writing to disk (which considering we need

about 400% scaling to hit our target as it is is probably over generous ;-)

) then we would have to write 65.5 terabytes in 1.25 hours of master cpu

time. This equates to a bandwidth to disk from the master node only of 14.9

GB/sec. Since each write would also require a mpi_reduce we would also need

14.9GB/sec of bandwidth on the backplane to the master...

So who wants to volunteer the code to tackle this problem???

Have fun...

Ross

Received on Wed Jun 21 2006 - 06:07:08 PDT

Custom Search