Re: [AMBER-Developers] pmemd.MPI build broken

From: Jason Swails <jason.swails.gmail.com> Date: Mon, 7 Mar 2016 11:49:07 -0500

--
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher
> On Mar 7, 2016, at 10:15 AM, Benny1 M <benny1.m.tcs.com> wrote:
> 
> We are currently working to address the problems arising due to adding 
> OpenMP as default for parallel builds
> Since we have tested MPI + OpenMP only with gnu and intel compilers, we 
> have enabled openmp only with
> these 2 compilers and that too only if it is specifically configured with 
> -openmp flag.
> 
> We are not making these changes on the master branch, but are making these 
> changes on a branch masterintelmerge. 
> This branch is taken from the commit before the openmp changes were 
> reversed.
> 
> The configure options will be as follows:
> 
> ./configure gnu (or intel, or PGI, or CLANG ..) 
>        - builds code that is entirely serial.
> 
> ./configure -mpi gnu (or intel, or PGI, or CLANG ..) 
>      - builds code that is pure MPI.
> 
> ./configure -mpi -openmp (gnu or intel)
>      - builds MPI / OpenMP hybrid code for GB
>      - for other compilers GB will be pure MPI
>      - Ideally the number of ranks should be equal to number of sockets 
> and 
>        OpenMP threads should be equal to cores/socket. We are working on 
>        a way that these are suitably set, but ideas are welcome.
> 
> ./configure -mic-native -mpi -openmp intel 
>      - build code specifically for KNC 
> 
> ./configure -MIC2 -mpi -openmp intel 
>      - build code specifically for KNL 
> 
> While there is no option for a pure openmp build, if the check in 
> pmemd.F90 for 2 ranks is disabled only for MPI + OpenMP, we can run with 1 
> rank and multiple openmp threads.
> 
>        For larger workloads (order of nucleosome and above), the 
> performance gain of using MPI + OpenMP is of the order of 2 -  4 times.
>        For smaller workloads that do not have enough atoms to parallelize 
> (myoglobin and lower), pure MPI performance better
> 
> regards,
> Benny Mathew
> 
> 
> 
> 
> From:   Jason Swails <jason.swails.gmail.com>
> To:     AMBER Developers Mailing List <amber-developers.ambermd.org>
> Date:   06-03-2016 00:03
> Subject:        Re: [AMBER-Developers] pmemd.MPI build broken
> 
> 
> 
> On Sat, Mar 5, 2016 at 11:25 AM, Ross Walker <ross.rosswalker.co.uk> 
> wrote:
> 
>> 
>>> On Mar 5, 2016, at 06:29, David A Case <david.case.rutgers.edu> wrote:
>>> 
>>> On Sat, Mar 05, 2016, Jason Swails wrote:
>>>> 
>>>> Also, when I switch to using OpenMPI *without* dragonegg, the linker
>> line
>>>> still needs -lgomp to complete successfully, so the build doesn't
> really
>>>> work in general yet.
>>> 
>>> Sounds like it's been tested mostly (only) with mpich and variants.(?)
>> It's
>>> suprising that the flavor of MPI library has an impact on the openmp
>> stuff.
>>> Maybe I'm misreading something.
>>> 
>>> I've posted my gnu5 + mpich test results to the wiki page: I'm at
> commit
>>> 2d5d9afbc305bfbca01.  Build is fine, but I see significant
> (non-roundoff)
>>> regressions.
>> 
>> Can you try
>> 
>> export OMP_NUM_THREADS=2, mpirun -np 2
>> 
>> and see if you get the same errors please.
>> 
>> It might be resource related - e.g. if you have 8 cores and do mpirun
> -np
>> 4 without setting OMP_NUM_THREADS you get 32 threads total for the GB
>> cases. (this will be addressed in doumentation shortly).
> 
> This is dangerous and undesirable behavior in my opinion.  Adding it to
> the documentation is not a fix.  For the longest time, ./configure -openmp
> was required to get OpenMP parallelism, and MPI-parallelized programs
> spawned an MPI thread for every CPU you wanted to use.  This behavior has
> changed for pmemd, so now if somebody uses a script they used for Amber 14
> and earlier with Amber 16, they will get the same answers (once the
> regressions are fixed), and it will reportedly use the same number of MPI
> threads in the output, but performance will tank while they thrash their
> resources.  Same thing happens if they replace "sander" by "pmemd" (which
> has always been the recommendation to get improved performance except 
> where
> features are only supported in one or the other).  UI compatibility with
> sander has always been a cornerstone of pmemd.
> 
> Mixed OpenMP-MPI has its place for sure -- MICs and dedicated
> supercomputers with many cores per node.  But for commodity clusters and
> single workstations, I see this as more of an obstacle than a benefit. For
> instance -- how do we parallelize on a single workstation now?  I would
> naively think you would need to do
> 
> mpirun -np 1 pmemd.MPI -O -i ...
> 
> and let OpenMP parallelize.  But no, that doesn't work, because of this in
> pmemd.F90:
> 
> 176 #ifndef MPI
> 177 #ifndef CUDA
> 178   if (numtasks .lt. 2 .and. master) then
> 
> 
> 
> 179     write(mdout, *) &
> 180       'MPI version of PMEMD must be used with 2 or more processors!'
> 181     call mexit(6, 1)
> 182   end if
> 183 #endif
> 184 #endif /*MPI*/
> 
> So how do you do it?  Well you can do this:
> 
> export OMP_NUM_THREADS=1
> mpirun -np 16 pmemd.MPI -O -i ...
> 
> Or you would need to do something like
> 
> export OMP_NUM_THREADS=8
> mpirun -np 2 pmemd.MPI -O -i ...
> 
> Which is better?  Why?  What safeguards do we have in there to avoid 
> people
> thrashing their systems?  What's the difference on a commodity cluster
> (say, parallelizing across ~4-8 nodes with a total of ~60 CPUs) between
> pmemd.MPI with and without OpenMP? I've profiled pmemd.MPI's GB scaling
> several years ago, and I was rather impressed -- despite the allgatherv
> every step, I could never hit the ceiling on scaling for a large system.
> Of course sander.MPI's GB scaling is quite good as well (not surprisingly,
> since it's really the same code).  So now that we have all this added
> complexity of how to run these simulations "correctly", what's the win in
> performance?
> 
> IMO, MPI/OpenMP is a specialty mix.  You use it when you are trying to
> really squeeze out the maximum performance on expensive hardware -- when
> you try to tune the right mix of SMP and distributed parallelism on
> multi-core supercomputers or harness the capabilities of an Intel MIC. And
> it requires a bit of tuning and experimentation/benchmarking to get the
> right settings for your desired performance on a specific machine for a
> specific system.  And for that it's all well and good.  But to take
> settings that are optimized for these kinds of highly specialized
> architectures and make that the default (and *only supported*) behavior on
> *all* systems seems like a rather obvious mistake from the typical user's
> perspective.
> 
> This is speculation, but based on real-world experience.  A huge problem
> here is that we have never seen this code before (it simply *existing* on
> an obscure branch somewhere doesn't count -- without being in master *or*
> explicitly asking for testers nobody will touch a volatile branch they 
> know
> nothing about).  So nobody has any idea how this is going to play out in
> the wild, and there's so little time between now and release that I don't
> think we could possibly get that answer. (And the actual developer of the
> code is unqualified to accurately anticipate challenges typical users will
> face in my experience).  This feels very http://bit.ly/1p7gB68 to me.
> 
> The two things I think we should do are
> 
> 1) Make OpenMP an optional add-in that you get when you configure with
> -openmp -mpi (or with -mic) and make it a separate executable so people
> will only run that code when they know that's precisely what they want to
> run
> 
> 2) Wait to release it until a wider audience of developers have actually
> gotten a chance to use it.
> 
> This is a large part of why we institute a code freeze well before 
> release.
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
> 
> 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers