amber-developers: FW: benchmark of Amber 9 on shiraz from Yong Duan on 2006-05-03 (Amber Developers Archive May 2006)

From: Yong Duan <duan.ucdavis.edu>
Date: Wed, 3 May 2006 10:43:41 -0700

Hi Guys,

We are benchmarking a cluster of dual-core dual-cpu opteron (1.8 GHz) with
GigE and noticed a funny behavior on scaling. The PMEMD scales very well
below 32-cpu level which is great. But as soon as we tried 64-cpu level, the
scaling became notably poor, regardless of the system size. We initially
thought this must be related to system size. We then tried 23,000-atom and
230,000-atom systems and noticed they behaved the same way. Any hint?

yong

-----Original Message-----
From: choo woo [mailto:koolben3.yahoo.com]
Sent: Wednesday, May 03, 2006 10:37 AM
To: Yong Duan
Subject: RE: benchmark of Amber 9 on shiraz

I have no idea. When I get some time later, I may look
into the detail.
Chun

--- Yong Duan <duan.ucdavis.edu> wrote:

>
> Chun,
>
> Why there is a "barrier" at 32/64CPU level,
> regardless of system size? The
> scaling looks pretty good at the 32-cpu level but
> drops significantly at the
> 64-cpu level, regardless of the system size. In
> other words, why 8 nodes
> work better than 16 nodes?
>
> yong
>
> > -----Original Message-----
> > From: choo woo [mailto:koolben3.yahoo.com]
> > Sent: Wednesday, May 03, 2006 10:25 AM
> > To: Lin, Dawei; Yong Duan; Lewis, Mike;
> benwu.ucdavis.edu
> > Cc: duan_group.albert.genomecenter.ucdavis.edu
> > Subject: benchmark of Amber 9 on shiraz
> >
> >
> > Shiraz performs well!
> >
> > As for small system , the simulation can be scaled
> up
> > to only 4 CPUs (16.8 ns per day for ~800 atom
> system).
> > As to large/very large system, it can be scaled up
> 32
> > CPUs ( 8 ns per day for ~30000 atom system; 1.6 ns
> per
> > day for ~240000 atom system).
> >
> > Chun
> >
> >
> > Amber 9
> >
> > 1.) small systm:
> > protein G
> > 855 atoms 56 residues 10ps
> >
> > GBSA simulation
> > ifort+MKL
> >
> > ./2GB1.00/2GB1.00_0001.out 1CPU
> > | Runmd Time 175.90 (100.0% of
> Total)
> >
> > /2GB1.01/2GB1.01_0001.out 4CPUs
> > | Runmd Time 51.41 (99.75% of
> Total)
> >
> > ./2GB1.02/2GB1.02_0001.out 8CPUs
> > | Runmd Time 46.22 (99.38% of
> Total)
> >
> > ./2GB1.03/2GB1.03_0001.out 16CPUs
> > | Runmd Time 58.57 (98.32% of
> Total)
> >
> >
> > 2.) Large system
> >
> > 27404 atoms 10ps ca 120 residues + waters
> >
> > PMEMD, pathscale
> >
> > ./sh2c.01/sh2c.01_0001.out 4 CPUs
> > | Master Total CPU time: 740.43 seconds
>
> > 0.21 hours
> >
> > ./sh2c.02/sh2c.02_0001.out 8 CPUs
> > | Master Total CPU time: 373.62 seconds
>
> > 0.10 hours
> >
> > ./sh2c.03/sh2c.03_0001.out 16 CPUs
> > | Master Total CPU time: 207.80 seconds
>
> > 0.06 hours
> >
> > ./sh2c.04/sh2c.04_0001.out 32 CPUs
> > | Master Total CPU time: 109.58 seconds
>
> > 0.03 hours
> >
> > ./sh2c.05/sh2c.05_0001.out 64 CPUs
> > | Master Total CPU time: 127.88 seconds
>
> > 0.04 hours
> >
> > 3. very large system
> > 238985 atoms 10ps
> > PMEMD pathf90
> >
> > ./hist1.01/hist1.01_0001.out 4CPUs
> > | Master Total CPU time: 5966.00 seconds
>
> > 1.66 hours
> >
> > ./hist1.02/hist1.02_0001.out 8CPUs
> > | Master Total CPU time: 3029.44 seconds
>
> > 0.84 hours
> >
> > ./hist1.03/hist1.03_0001.out 16CPUs
> > | Master Total CPU time: 1569.34 seconds
>
> > 0.44 hours
> >
> > ./hist1.04/hist1.04_0001.out 32CPUs
> > | Master Total CPU time: 546.66 seconds
>
> > 0.15 hours
> >
> > ./hist1.05/hist1.05_0001.out 64CPUs
> > | Master Total CPU time: 728.11 seconds
>
> > 0.20 hours
> >
> >
Received on Thu May 04 2006 - 17:10:38 PDT