Re: amber-developers: Paper on improving Gromacs scaling on ethernet. from Robert Duke on 2007-04-26 (Amber Developers Archive Apr 2007)

From: Robert Duke <rduke.email.unc.edu>
Date: Thu, 26 Apr 2007 18:15:34 -0400

Hi Ross,
I did see the paper; did not read it to the same level of detail but pitched
it somewhere in the pile. I think this is okay stuff; I think the better
thing to do is to try really hard to get people to realize that infiniband
is worth it if you actually are trying to get anything done. For gigabit
ethernet I use back-to-back crossover cables between 2 machines, server
nics, whack the kernel and mpi params for large tcp/ip packets (all detailed
on the amber web site), and doing all that can drive gb ethernet pretty
hard. But the data volume associated with pme and a problem of any size is
large, and the possible grief from slow switches or cascaded switches makes
me twitch. I think a popular solution soon will be 8 cores in 1 box - no
net; probably already there. Lee passed some stuff by me today with
somewhere around 20-40 cores in one box but a gigabit interconnect, non
unix, probably other problems too I am not recollecting.
Regards - Bob

----- Original Message -----
From: "Ross Walker" <ross.rosswalker.co.uk>
To: <amber-developers.scripps.edu>
Sent: Thursday, April 26, 2007 4:54 PM
Subject: amber-developers: Paper on improving Gromacs scaling on ethernet.

> Hi All,
>
> You might be interested in the following paper that discusses improving
> the
> scaling of Gromacs on ethernet - above 2 nodes which is the limit if you
> use
> the defaults:
>
> http://www3.interscience.wiley.com/cgi-bin/abstract/114205207/ABSTRACT
>
> We are seeing the same behaviour with Amber these days - i.e. as soon as
> you
> try to go beyond 2x2cpu nodes with gigabit ethernet the performance just
> dies. This paper has a number of suggestions that highlight in particular
> how the default settings of modern switches are not appropriate... Upon
> reading it a lot of this, with hindsight :-), is obvious... Most switches
> these days come with QOS and flow control settings optimized for a bunch
> of
> people in an office browsing the web, listening to streaming content and
> windows based file sharing. This plays havoc with MPI messages where ALL
> to
> ALL communications get blasted by pack losses. Essentially the main tips
> are:
>
> 1) Turn on IEEE 802.3x flow control on the switch and network cards -
> assuming you bought a decent ethernet switch that supports this.
>
> 2) Set the switch to use QOS_PASSTHROUGH_MODE - essentially turning off
> QOS
> so you can recoup the memory used here as general buffer space.
>
> 3) On 48 port switches only use 36 ports in the form of 9 per 12 port
> block.
> Seems that most modern switches are constructed out of blocks of 12 port
> sub
> switches and that the links between subswitches are only 10Gbit/s - this
> limits you to 9 ports per 12 port block.
>
> 4) use openMPI or MPICH-2 - or alternatively implement ordered alltoall
> communication approaches - this would apply to anything involving all to
> all
> communication like mpi_allreduce etc... Mostly I don't think we use the
> all
> to all communicators specifically, at least not for large datasizes, but
> from their conclusions it would appear that you only benefit from ordered
> all to all's or the MPICH2/openMPI ordered schemes if you implement option
> 3
> to ensure there is no packet loss within switches.
>
> Aside from this if you start chaining switches together and get assigned
> processors on different physical switches then all bets are off.
>
> Unfortunately (or maybe fortunately depending on one's perspective) I
> don't
> have physical access to any ethernet based clusters anymore so I can't
> test
> these recommendations out with AMBER but if anyone has access to such a
> cluster and is happy playing with their switch configuration they may want
> to experiment with the above and post feedback to the developers list. If
> this really helps then perhaps we should consider putting a short
> tutorial/overview on the Amber website to benefit others.
>
> All the best
> Ross
>
> /\
> \/
> |\oss Walker
>
> | HPC Consultant and Staff Scientist |
> | San Diego Supercomputer Center |
> | Tel: +1 858 822 0854 | EMail:- ross.rosswalker.co.uk |
> | http://www.rosswalker.co.uk | PGP Key available on request |
>
> Note: Electronic Mail is not secure, has no guarantee of delivery, may not
> be read every day, and should not be used for urgent or sensitive issues.
>
>
>
Received on Sun Apr 29 2007 - 06:07:26 PDT