Hi Guys,
We are benchmarking a cluster of dual-core dual-cpu opteron (1.8 GHz) with
GigE and noticed a funny behavior on scaling. The PMEMD scales very well
below 32-cpu level which is great. But as soon as we tried 64-cpu level, the
scaling became notably poor, regardless of the system size. We initially
thought this must be related to system size. We then tried 23,000-atom and
230,000-atom systems and noticed they behaved the same way. Any hint?
yong
-----Original Message-----
From: choo woo [mailto:koolben3.yahoo.com] 
Sent: Wednesday, May 03, 2006 10:37 AM
To: Yong Duan
Subject: RE: benchmark of Amber 9 on shiraz
I have no idea. When I get some time later, I may look
into the detail.
Chun
--- Yong Duan <duan.ucdavis.edu> wrote:
> 
> Chun,
> 
> Why there is a "barrier" at 32/64CPU level,
> regardless of system size? The
> scaling looks pretty good at the 32-cpu level but
> drops significantly at the
> 64-cpu level, regardless of the system size. In
> other words, why 8 nodes
> work better than 16 nodes?
> 
> yong
> 
> > -----Original Message-----
> > From: choo woo [mailto:koolben3.yahoo.com] 
> > Sent: Wednesday, May 03, 2006 10:25 AM
> > To: Lin, Dawei; Yong Duan; Lewis, Mike;
> benwu.ucdavis.edu
> > Cc: duan_group.albert.genomecenter.ucdavis.edu
> > Subject: benchmark of Amber 9 on shiraz
> > 
> > 
> > Shiraz performs well!
> > 
> > As for small system , the simulation can be scaled
> up
> > to only 4 CPUs (16.8 ns per day for ~800 atom
> system).
> > As to large/very large system, it can be scaled up
> 32
> > CPUs ( 8 ns per day for ~30000 atom system; 1.6 ns
> per
> > day for ~240000 atom system).
> > 
> > Chun
> > 
> > 
> > Amber 9
> > 
> > 1.) small systm:
> > protein G
> > 855 atoms 56 residues 10ps
> > 
> > GBSA simulation
> > ifort+MKL
> > 
> > ./2GB1.00/2GB1.00_0001.out     1CPU
> > |    Runmd Time               175.90 (100.0% of
> Total)
> > 
> > /2GB1.01/2GB1.01_0001.out      4CPUs
> > |    Runmd Time                51.41 (99.75% of
> Total)
> > 
> > ./2GB1.02/2GB1.02_0001.out     8CPUs
> > |    Runmd Time                46.22 (99.38% of
> Total)
> > 
> > ./2GB1.03/2GB1.03_0001.out     16CPUs
> > |    Runmd Time                58.57 (98.32% of
> Total)
> > 
> > 
> > 2.) Large system
> > 
> > 27404 atoms   10ps ca 120 residues + waters
> > 
> > PMEMD, pathscale
> > 
> > ./sh2c.01/sh2c.01_0001.out         4 CPUs
> > |  Master Total CPU time:          740.43 seconds 
>   
> > 0.21 hours
> > 
> > ./sh2c.02/sh2c.02_0001.out         8 CPUs
> > |  Master Total CPU time:          373.62 seconds 
>   
> > 0.10 hours
> > 
> > ./sh2c.03/sh2c.03_0001.out         16 CPUs
> > |  Master Total CPU time:          207.80 seconds 
>   
> > 0.06 hours
> > 
> > ./sh2c.04/sh2c.04_0001.out         32 CPUs
> > |  Master Total CPU time:          109.58 seconds 
>   
> > 0.03 hours
> > 
> > ./sh2c.05/sh2c.05_0001.out         64 CPUs
> > |  Master Total CPU time:          127.88 seconds 
>   
> > 0.04 hours
> > 
> > 3. very large system
> > 238985 atoms 10ps
> > PMEMD  pathf90
> > 
> > ./hist1.01/hist1.01_0001.out      4CPUs
> > |  Master Total CPU time:         5966.00 seconds 
>   
> > 1.66 hours
> > 
> > ./hist1.02/hist1.02_0001.out      8CPUs
> > |  Master Total CPU time:         3029.44 seconds 
>   
> > 0.84 hours
> > 
> > ./hist1.03/hist1.03_0001.out      16CPUs
> > |  Master Total CPU time:         1569.34 seconds 
>   
> > 0.44 hours
> > 
> > ./hist1.04/hist1.04_0001.out      32CPUs
> > |  Master Total CPU time:          546.66 seconds 
>   
> > 0.15 hours
> > 
> > ./hist1.05/hist1.05_0001.out      64CPUs
> > |  Master Total CPU time:          728.11 seconds 
>   
> > 0.20 hours
> > 
> > 
Received on Thu May 04 2006 - 17:10:38 PDT