Re: [AMBER-Developers] pmemd.MPI build broken

From: Benny1 M <benny1.m.tcs.com>
Date: Mon, 7 Mar 2016 20:45:02 +0530

We are currently working to address the problems arising due to adding
OpenMP as default for parallel builds
 Since we have tested MPI + OpenMP only with gnu and intel compilers, we
have enabled openmp only with
 these 2 compilers and that too only if it is specifically configured with
 -openmp flag.

We are not making these changes on the master branch, but are making these
changes on a branch masterintelmerge.
This branch is taken from the commit before the openmp changes were
reversed.

The configure options will be as follows:

./configure gnu (or intel, or PGI, or CLANG ..)
        - builds code that is entirely serial.

./configure -mpi gnu (or intel, or PGI, or CLANG ..)
      - builds code that is pure MPI.

./configure -mpi -openmp (gnu or intel)
      - builds MPI / OpenMP hybrid code for GB
      - for other compilers GB will be pure MPI
      - Ideally the number of ranks should be equal to number of sockets
and
        OpenMP threads should be equal to cores/socket. We are working on
        a way that these are suitably set, but ideas are welcome.

./configure -mic-native -mpi -openmp intel
      - build code specifically for KNC

./configure -MIC2 -mpi -openmp intel
      - build code specifically for KNL

While there is no option for a pure openmp build, if the check in
pmemd.F90 for 2 ranks is disabled only for MPI + OpenMP, we can run with 1
rank and multiple openmp threads.

        For larger workloads (order of nucleosome and above), the
performance gain of using MPI + OpenMP is of the order of 2 - 4 times.
        For smaller workloads that do not have enough atoms to parallelize
(myoglobin and lower), pure MPI performance better

regards,
Benny Mathew




From: Jason Swails <jason.swails.gmail.com>
To: AMBER Developers Mailing List <amber-developers.ambermd.org>
Date: 06-03-2016 00:03
Subject: Re: [AMBER-Developers] pmemd.MPI build broken



On Sat, Mar 5, 2016 at 11:25 AM, Ross Walker <ross.rosswalker.co.uk>
wrote:

>
> > On Mar 5, 2016, at 06:29, David A Case <david.case.rutgers.edu> wrote:
> >
> > On Sat, Mar 05, 2016, Jason Swails wrote:
> >>
> >> Also, when I switch to using OpenMPI *without* dragonegg, the linker
> line
> >> still needs -lgomp to complete successfully, so the build doesn't
really
> >> work in general yet.
> >
> > Sounds like it's been tested mostly (only) with mpich and variants.(?)
> It's
> > suprising that the flavor of MPI library has an impact on the openmp
> stuff.
> > Maybe I'm misreading something.
> >
> > I've posted my gnu5 + mpich test results to the wiki page: I'm at
commit
> > 2d5d9afbc305bfbca01. Build is fine, but I see significant
(non-roundoff)
> > regressions.
>
> Can you try
>
> export OMP_NUM_THREADS=2, mpirun -np 2
>
> and see if you get the same errors please.
>
> It might be resource related - e.g. if you have 8 cores and do mpirun
-np
> 4 without setting OMP_NUM_THREADS you get 32 threads total for the GB
> cases. (this will be addressed in doumentation shortly).
>

​This is dangerous and undesirable behavior in my opinion. Adding it to
the documentation is not a fix. For the longest time, ./configure -openmp
was required to get OpenMP parallelism, and MPI-parallelized programs
spawned an MPI thread for every CPU you wanted to use. This behavior has
changed for pmemd, so now if somebody uses a script they used for Amber 14
and earlier with Amber 16, they will get the same answers (once the
regressions are fixed), and it will reportedly use the same number of MPI
threads in the output, but performance will tank while they thrash their
resources. Same thing happens if they replace "sander" by "pmemd" (which
has always been the recommendation to get improved performance except
where
features are only supported in one or the other). UI compatibility with
sander has always been a cornerstone of pmemd.

Mixed OpenMP-MPI has its place for sure -- MICs and dedicated
supercomputers with many cores per node. But for commodity clusters and
single workstations, I see this as more of an obstacle than a benefit. For
instance -- how do we parallelize on a single workstation now? I would
naively think you would need to do

mpirun -np 1 pmemd.MPI -O -i ...

and let OpenMP parallelize. But no, that doesn't work, because of this in
pmemd.F90:

176 #ifndef MPI
177 #ifndef CUDA
178 if (numtasks .lt. 2 .and. master) then



179 write(mdout, *) &
180 'MPI version of PMEMD must be used with 2 or more processors!'
181 call mexit(6, 1)
182 end if
183 #endif
184 #endif /*MPI*/

So how do you do it? Well you can do this:

export OMP_NUM_THREADS=1
mpirun -np 16 pmemd.MPI -O -i ...

Or you would need to do something like

export OMP_NUM_THREADS=8
mpirun -np 2 pmemd.MPI -O -i ...

Which is better? Why? What safeguards do we have in there to avoid
people
thrashing their systems? What's the difference on a commodity cluster
(say, parallelizing across ~4-8 nodes with a total of ~60 CPUs) between
pmemd.MPI with and without OpenMP? I've profiled pmemd.MPI's GB scaling
several years ago, and I was rather impressed -- despite the allgatherv
every step, I could never hit the ceiling on scaling for a large system.
Of course sander.MPI's GB scaling is quite good as well (not surprisingly,
since it's really the same code). So now that we have all this added
complexity of how to run these simulations "correctly", what's the win in
performance?

IMO, MPI/OpenMP is a specialty mix. You use it when you are trying to
really squeeze out the maximum performance on expensive hardware -- when
you try to tune the right mix of SMP and distributed parallelism on
multi-core supercomputers or harness the capabilities of an Intel MIC. And
it requires a bit of tuning and experimentation/benchmarking to get the
right settings for your desired performance on a specific machine for a
specific system. And for that it's all well and good. But to take
settings that are optimized for these kinds of highly specialized
architectures and make that the default (and *only supported*) behavior on
*all* systems seems like a rather obvious mistake from the typical user's
perspective.

This is speculation, but based on real-world experience. A huge problem
here is that we have never seen this code before (it simply *existing* on
an obscure branch somewhere doesn't count -- without being in master *or*
explicitly asking for testers nobody will touch a volatile branch they
know
nothing about). So nobody has any idea how this is going to play out in
the wild, and there's so little time between now and release that I don't
think we could possibly get that answer. (And the actual developer of the
code is unqualified to accurately anticipate challenges typical users will
face in my experience). This feels very http://bit.ly/1p7gB68 to me.

The two things I think we should do are

1) Make OpenMP an optional add-in that you get when you configure with
-openmp -mpi (or with -mic) and make it a separate executable so people
will only run that code when they know that's precisely what they want to
run

2) Wait to release it until a wider audience of developers have actually
gotten a chance to use it.

This is a large part of why we institute a code freeze well before
release.
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers


=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Mar 07 2016 - 07:30:03 PST
Custom Search