Re: amber-developers: Some testing result on pmemd scaling and parallel computation

From: Robert Duke <rduke.email.unc.edu>
Date: Mon, 29 Sep 2008 22:03:41 -0400

Thanks Mengjuei,
Nothing too surprising here, but always good to get data from additional
sources.
Best Regards - Bob

----- Original Message -----
From: "Mengjuei Hsieh" <mengjueh.uci.edu>
To: <amber-developers.scripps.edu>
Sent: Monday, September 29, 2008 9:27 PM
Subject: amber-developers: Some testing result on pmemd scaling and parallel
computation


> Here is some recap on the JAC benchmark performance on different
> parallel options I did this weekend.
>
> We were trying to explore the options for network connection with
> jumbo frame (also known as large MTU, mtu=9000 in linux) gigabit
> ethernet local network to see if we can replace the previous parallel
> computing solution of connecting two machines directly with an
> ethernet cable (we called it sub-pairs to reflect the fact that by
> doing so, the machines will be grouped in pairs). The reason is
> obvious, grouping computing nodes in pairs is not an efficient way to
> work with nor to manage the nodes.
>
> We tested with the NetPipe benchmark to measure the performance of a
> gigabit ethernet with or without jumbo frame, the benchmark is
> consistent with general wisdom and references on the internet or on
> the literature. I thought we could utilize more bandwidth with jumbo
> frame ethernet.
>
> First, I tested the scaling of amber 9 pmemd with lam/mpi or mpich on
> jumbo frame ethernet. The configurations of the testing environment
> look like this:
>
> Two identical dell poweredge 1950, each comes with 2 intel xeon 5140
> woodcrest duo-core processors, 4MB cache, 2GB RAM. Shared memory
> interconnect / MPICH-1.2.6 / LAM-MPI 7.1.4 Intel Fortran 90 compiler,
> Intel MKL
>
> The results of the parallel performance are:
> *******************************************************************************
> JAC - NVE ensemble, PME, 23,558 atoms
>
> #procs nsec/day scaling, %
>
> 1 0.329 --
> 2 0.628 95 (SMP)
> 4 1.094 83 (SMP)
> 4 0.965 73 (TCP, 1+1+1+1)
> 4 0.819 62 (SMP/TCP, 2+2)
> 8 0.987 37 (SMP/TCP, 4+4)
>
> This does not meet the definition of "scaling" therefore the network
> traffic was also measured and I found that in the case of the network
> communication, only 30% of the bandwidth is recorded. For some
> sidenotes, these are under the parameter of at least
> P4_SOCKBUFSIZE=131072 (mpich) and net.core.rmem_max=131072
> net.core.wmem_max=131072, similar results have been observed under
> lam-mpi rpi_tcp_short=131072.
>
> Further test on direct connection pairs shows that the measurement is
> similar.
>
> Therefore the benchmark fell back to amber 8 pmemd, which is the
> original program we had in the sub-pair configuration.
>
> the results of the parallel performance with amber 8 pmemd are:
> *******************************************************************************
> JAC - NVE ensemble, PME, 23,558 atoms
>
> #procs nsec/day scaling, %
>
> 1 0.203 --
> 2 0.391 96 (SMP)
> 4 0.465 57 (SMP)
> 4 0.457 56 (SMP/TCP, 2+2)
> 8 0.680 42 (SMP/TCP, 4+4)
>
> Less efficient amber 8 pmemd makes the scaling factor of 4+4cpus
> parallel computation look better, but the performance is definitely
> not better. Similar results were observed on directly connected pairs.
>
> The interest of this exploration then turns to the scaling of the
> AMBER 10 pmemd, and the results are:
> *******************************************************************************
> JAC - NVE ensemble, PME, 23,558 atoms
>
> #procs nsec/day scaling, %
>
> 1 0.411 --
> 4 1.329 80 (SMP)
> 8 1.137 35 (SMP/TCP, 4+4)
>
> At this point, I can say is don't expect anything too interesting from
> gigabit ethernet performance. This conclusion is consistent with
> observation from Dr. Duke and Dr. Walker.
>
> A further benchmark has been done for Amber10 pmemd on a dual
> quad-cores intel xeon E5410 machine (dell PE1950, 2.3GMhz, 6MB cache,
> 2G RAM):
> *******************************************************************************
> JAC - NVE ensemble, PME, 23,558 atoms (on the same machine, SMP mode)
>
> #procs nsec/day scaling, %
>
> 1 0.434 --
> 2 0.815 94
> 4 1.464 84
> 6 1.964 75
> 8 2.274 65
>
> That's all. AMBER 10 pmemd rocks.
>
> Bests,
> --
> Mengjuei
>
Received on Wed Oct 01 2008 - 05:09:18 PDT
Custom Search