I would suggest branching from the intel branch. I created that branch with a merge point after the reversion (but without the actual reverted changes) specifically to make merging *back* to master much easier.
--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
> On Mar 7, 2016, at 10:15 AM, Benny1 M <benny1.m.tcs.com> wrote:
>
> We are currently working to address the problems arising due to adding
> OpenMP as default for parallel builds
> Since we have tested MPI + OpenMP only with gnu and intel compilers, we
> have enabled openmp only with
> these 2 compilers and that too only if it is specifically configured with
> -openmp flag.
>
> We are not making these changes on the master branch, but are making these
> changes on a branch masterintelmerge.
> This branch is taken from the commit before the openmp changes were
> reversed.
>
> The configure options will be as follows:
>
> ./configure gnu (or intel, or PGI, or CLANG ..)
> - builds code that is entirely serial.
>
> ./configure -mpi gnu (or intel, or PGI, or CLANG ..)
> - builds code that is pure MPI.
>
> ./configure -mpi -openmp (gnu or intel)
> - builds MPI / OpenMP hybrid code for GB
> - for other compilers GB will be pure MPI
> - Ideally the number of ranks should be equal to number of sockets
> and
> OpenMP threads should be equal to cores/socket. We are working on
> a way that these are suitably set, but ideas are welcome.
>
> ./configure -mic-native -mpi -openmp intel
> - build code specifically for KNC
>
> ./configure -MIC2 -mpi -openmp intel
> - build code specifically for KNL
>
> While there is no option for a pure openmp build, if the check in
> pmemd.F90 for 2 ranks is disabled only for MPI + OpenMP, we can run with 1
> rank and multiple openmp threads.
>
> For larger workloads (order of nucleosome and above), the
> performance gain of using MPI + OpenMP is of the order of 2 - 4 times.
> For smaller workloads that do not have enough atoms to parallelize
> (myoglobin and lower), pure MPI performance better
>
> regards,
> Benny Mathew
>
>
>
>
> From: Jason Swails <jason.swails.gmail.com>
> To: AMBER Developers Mailing List <amber-developers.ambermd.org>
> Date: 06-03-2016 00:03
> Subject: Re: [AMBER-Developers] pmemd.MPI build broken
>
>
>
> On Sat, Mar 5, 2016 at 11:25 AM, Ross Walker <ross.rosswalker.co.uk>
> wrote:
>
>>
>>> On Mar 5, 2016, at 06:29, David A Case <david.case.rutgers.edu> wrote:
>>>
>>> On Sat, Mar 05, 2016, Jason Swails wrote:
>>>>
>>>> Also, when I switch to using OpenMPI *without* dragonegg, the linker
>> line
>>>> still needs -lgomp to complete successfully, so the build doesn't
> really
>>>> work in general yet.
>>>
>>> Sounds like it's been tested mostly (only) with mpich and variants.(?)
>> It's
>>> suprising that the flavor of MPI library has an impact on the openmp
>> stuff.
>>> Maybe I'm misreading something.
>>>
>>> I've posted my gnu5 + mpich test results to the wiki page: I'm at
> commit
>>> 2d5d9afbc305bfbca01. Build is fine, but I see significant
> (non-roundoff)
>>> regressions.
>>
>> Can you try
>>
>> export OMP_NUM_THREADS=2, mpirun -np 2
>>
>> and see if you get the same errors please.
>>
>> It might be resource related - e.g. if you have 8 cores and do mpirun
> -np
>> 4 without setting OMP_NUM_THREADS you get 32 threads total for the GB
>> cases. (this will be addressed in doumentation shortly).
>
> This is dangerous and undesirable behavior in my opinion. Adding it to
> the documentation is not a fix. For the longest time, ./configure -openmp
> was required to get OpenMP parallelism, and MPI-parallelized programs
> spawned an MPI thread for every CPU you wanted to use. This behavior has
> changed for pmemd, so now if somebody uses a script they used for Amber 14
> and earlier with Amber 16, they will get the same answers (once the
> regressions are fixed), and it will reportedly use the same number of MPI
> threads in the output, but performance will tank while they thrash their
> resources. Same thing happens if they replace "sander" by "pmemd" (which
> has always been the recommendation to get improved performance except
> where
> features are only supported in one or the other). UI compatibility with
> sander has always been a cornerstone of pmemd.
>
> Mixed OpenMP-MPI has its place for sure -- MICs and dedicated
> supercomputers with many cores per node. But for commodity clusters and
> single workstations, I see this as more of an obstacle than a benefit. For
> instance -- how do we parallelize on a single workstation now? I would
> naively think you would need to do
>
> mpirun -np 1 pmemd.MPI -O -i ...
>
> and let OpenMP parallelize. But no, that doesn't work, because of this in
> pmemd.F90:
>
> 176 #ifndef MPI
> 177 #ifndef CUDA
> 178 if (numtasks .lt. 2 .and. master) then
>
>
>
> 179 write(mdout, *) &
> 180 'MPI version of PMEMD must be used with 2 or more processors!'
> 181 call mexit(6, 1)
> 182 end if
> 183 #endif
> 184 #endif /*MPI*/
>
> So how do you do it? Well you can do this:
>
> export OMP_NUM_THREADS=1
> mpirun -np 16 pmemd.MPI -O -i ...
>
> Or you would need to do something like
>
> export OMP_NUM_THREADS=8
> mpirun -np 2 pmemd.MPI -O -i ...
>
> Which is better? Why? What safeguards do we have in there to avoid
> people
> thrashing their systems? What's the difference on a commodity cluster
> (say, parallelizing across ~4-8 nodes with a total of ~60 CPUs) between
> pmemd.MPI with and without OpenMP? I've profiled pmemd.MPI's GB scaling
> several years ago, and I was rather impressed -- despite the allgatherv
> every step, I could never hit the ceiling on scaling for a large system.
> Of course sander.MPI's GB scaling is quite good as well (not surprisingly,
> since it's really the same code). So now that we have all this added
> complexity of how to run these simulations "correctly", what's the win in
> performance?
>
> IMO, MPI/OpenMP is a specialty mix. You use it when you are trying to
> really squeeze out the maximum performance on expensive hardware -- when
> you try to tune the right mix of SMP and distributed parallelism on
> multi-core supercomputers or harness the capabilities of an Intel MIC. And
> it requires a bit of tuning and experimentation/benchmarking to get the
> right settings for your desired performance on a specific machine for a
> specific system. And for that it's all well and good. But to take
> settings that are optimized for these kinds of highly specialized
> architectures and make that the default (and *only supported*) behavior on
> *all* systems seems like a rather obvious mistake from the typical user's
> perspective.
>
> This is speculation, but based on real-world experience. A huge problem
> here is that we have never seen this code before (it simply *existing* on
> an obscure branch somewhere doesn't count -- without being in master *or*
> explicitly asking for testers nobody will touch a volatile branch they
> know
> nothing about). So nobody has any idea how this is going to play out in
> the wild, and there's so little time between now and release that I don't
> think we could possibly get that answer. (And the actual developer of the
> code is unqualified to accurately anticipate challenges typical users will
> face in my experience). This feels very http://bit.ly/1p7gB68 to me.
>
> The two things I think we should do are
>
> 1) Make OpenMP an optional add-in that you get when you configure with
> -openmp -mpi (or with -mic) and make it a separate executable so people
> will only run that code when they know that's precisely what they want to
> run
>
> 2) Wait to release it until a wider audience of developers have actually
> gotten a chance to use it.
>
> This is a large part of why we institute a code freeze well before
> release.
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Mar 07 2016 - 09:00:03 PST