Re: [AMBER-Developers] pmemd.MPI build broken

From: Rai Nitin <rai.nitin.tcs.com>
Date: Tue, 8 Mar 2016 19:15:27 +0530

We have updated the "intel" branch with the build options as given in
Benny's mail.
We have tested GB with intel & gnu compilers. Compilers that don't support
OpenMP are protected.
There may be some more changes related to PME, however, this branch can be
picked up for testing GB.

Thanks,
Nitin Rai




From: Jason Swails <jason.swails.gmail.com>
To: AMBER Developers Mailing List <amber-developers.ambermd.org>
Date: 07-03-2016 22:19
Subject: Re: [AMBER-Developers] pmemd.MPI build broken



I would suggest branching from the intel branch. I created that branch
with a merge point after the reversion (but without the actual reverted
changes) specifically to make merging *back* to master much easier.

--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

> On Mar 7, 2016, at 10:15 AM, Benny1 M <benny1.m.tcs.com> wrote:
>
> We are currently working to address the problems arising due to adding
> OpenMP as default for parallel builds
> Since we have tested MPI + OpenMP only with gnu and intel compilers, we
> have enabled openmp only with
> these 2 compilers and that too only if it is specifically configured
with
> -openmp flag.
>
> We are not making these changes on the master branch, but are making
these
> changes on a branch masterintelmerge.
> This branch is taken from the commit before the openmp changes were
> reversed.
>
> The configure options will be as follows:
>
> ./configure gnu (or intel, or PGI, or CLANG ..)
> - builds code that is entirely serial.
>
> ./configure -mpi gnu (or intel, or PGI, or CLANG ..)
> - builds code that is pure MPI.
>
> ./configure -mpi -openmp (gnu or intel)
> - builds MPI / OpenMP hybrid code for GB
> - for other compilers GB will be pure MPI
> - Ideally the number of ranks should be equal to number of sockets
> and
> OpenMP threads should be equal to cores/socket. We are working on

> a way that these are suitably set, but ideas are welcome.
>
> ./configure -mic-native -mpi -openmp intel
> - build code specifically for KNC
>
> ./configure -MIC2 -mpi -openmp intel
> - build code specifically for KNL
>
> While there is no option for a pure openmp build, if the check in
> pmemd.F90 for 2 ranks is disabled only for MPI + OpenMP, we can run with
1
> rank and multiple openmp threads.
>
> For larger workloads (order of nucleosome and above), the
> performance gain of using MPI + OpenMP is of the order of 2 - 4 times.
> For smaller workloads that do not have enough atoms to
parallelize
> (myoglobin and lower), pure MPI performance better
>
> regards,
> Benny Mathew
>
>
>
>
> From: Jason Swails <jason.swails.gmail.com>
> To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> Date: 06-03-2016 00:03
> Subject: Re: [AMBER-Developers] pmemd.MPI build broken
>
>
>
> On Sat, Mar 5, 2016 at 11:25 AM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
>>
>>> On Mar 5, 2016, at 06:29, David A Case <david.case.rutgers.edu> wrote:
>>>
>>> On Sat, Mar 05, 2016, Jason Swails wrote:
>>>>
>>>> Also, when I switch to using OpenMPI *without* dragonegg, the linker
>> line
>>>> still needs -lgomp to complete successfully, so the build doesn't
> really
>>>> work in general yet.
>>>
>>> Sounds like it's been tested mostly (only) with mpich and variants.(?)
>> It's
>>> suprising that the flavor of MPI library has an impact on the openmp
>> stuff.
>>> Maybe I'm misreading something.
>>>
>>> I've posted my gnu5 + mpich test results to the wiki page: I'm at
> commit
>>> 2d5d9afbc305bfbca01. Build is fine, but I see significant
> (non-roundoff)
>>> regressions.
>>
>> Can you try
>>
>> export OMP_NUM_THREADS=2, mpirun -np 2
>>
>> and see if you get the same errors please.
>>
>> It might be resource related - e.g. if you have 8 cores and do mpirun
> -np
>> 4 without setting OMP_NUM_THREADS you get 32 threads total for the GB
>> cases. (this will be addressed in doumentation shortly).
>
> ​This is dangerous and undesirable behavior in my opinion. Adding it
to
> the documentation is not a fix. For the longest time, ./configure
-openmp
> was required to get OpenMP parallelism, and MPI-parallelized programs
> spawned an MPI thread for every CPU you wanted to use. This behavior
has
> changed for pmemd, so now if somebody uses a script they used for Amber
14
> and earlier with Amber 16, they will get the same answers (once the
> regressions are fixed), and it will reportedly use the same number of
MPI
> threads in the output, but performance will tank while they thrash their
> resources. Same thing happens if they replace "sander" by "pmemd"
(which
> has always been the recommendation to get improved performance except
> where
> features are only supported in one or the other). UI compatibility with
> sander has always been a cornerstone of pmemd.
>
> Mixed OpenMP-MPI has its place for sure -- MICs and dedicated
> supercomputers with many cores per node. But for commodity clusters and
> single workstations, I see this as more of an obstacle than a benefit.
For
> instance -- how do we parallelize on a single workstation now? I would
> naively think you would need to do
>
> mpirun -np 1 pmemd.MPI -O -i ...
>
> and let OpenMP parallelize. But no, that doesn't work, because of this
in
> pmemd.F90:
>
> 176 #ifndef MPI
> 177 #ifndef CUDA
> 178 if (numtasks .lt. 2 .and. master) then
>
>
>
> 179 write(mdout, *) &
> 180 'MPI version of PMEMD must be used with 2 or more processors!'
> 181 call mexit(6, 1)
> 182 end if
> 183 #endif
> 184 #endif /*MPI*/
>
> So how do you do it? Well you can do this:
>
> export OMP_NUM_THREADS=1
> mpirun -np 16 pmemd.MPI -O -i ...
>
> Or you would need to do something like
>
> export OMP_NUM_THREADS=8
> mpirun -np 2 pmemd.MPI -O -i ...
>
> Which is better? Why? What safeguards do we have in there to avoid
> people
> thrashing their systems? What's the difference on a commodity cluster
> (say, parallelizing across ~4-8 nodes with a total of ~60 CPUs) between
> pmemd.MPI with and without OpenMP? I've profiled pmemd.MPI's GB scaling
> several years ago, and I was rather impressed -- despite the allgatherv
> every step, I could never hit the ceiling on scaling for a large system.
> Of course sander.MPI's GB scaling is quite good as well (not
surprisingly,
> since it's really the same code). So now that we have all this added
> complexity of how to run these simulations "correctly", what's the win
in
> performance?
>
> IMO, MPI/OpenMP is a specialty mix. You use it when you are trying to
> really squeeze out the maximum performance on expensive hardware -- when
> you try to tune the right mix of SMP and distributed parallelism on
> multi-core supercomputers or harness the capabilities of an Intel MIC.
And
> it requires a bit of tuning and experimentation/benchmarking to get the
> right settings for your desired performance on a specific machine for a
> specific system. And for that it's all well and good. But to take
> settings that are optimized for these kinds of highly specialized
> architectures and make that the default (and *only supported*) behavior
on
> *all* systems seems like a rather obvious mistake from the typical
user's
> perspective.
>
> This is speculation, but based on real-world experience. A huge problem
> here is that we have never seen this code before (it simply *existing*
on
> an obscure branch somewhere doesn't count -- without being in master
*or*
> explicitly asking for testers nobody will touch a volatile branch they
> know
> nothing about). So nobody has any idea how this is going to play out in
> the wild, and there's so little time between now and release that I
don't
> think we could possibly get that answer. (And the actual developer of
the
> code is unqualified to accurately anticipate challenges typical users
will
> face in my experience). This feels very http://bit.ly/1p7gB68 to me.
>
> The two things I think we should do are
>
> 1) Make OpenMP an optional add-in that you get when you configure with
> -openmp -mpi (or with -mic) and make it a separate executable so people
> will only run that code when they know that's precisely what they want
to
> run
>
> 2) Wait to release it until a wider audience of developers have actually
> gotten a chance to use it.
>
> This is a large part of why we institute a code freeze well before
> release.
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers


_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Mar 08 2016 - 06:00:04 PST
Custom Search