Re: [AMBER-Developers] pmemd.MPI build broken

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Sat, 5 Mar 2016 13:30:58 -0700

Not sure if related, but pmemd.cuda.MPI build is also broken, but in a
different way (below). This is ubuntu 14.04, x86_64, gnu 4.8.4, mpich
3.0.4, cuda 7.5.

./configure -nofftw3 -nomtkpp -noemil -cuda -mpi gnu

```
mpif90 -DMPI -DBINTRAJ -DPUBFFT -O3 -mtune=native -fopenmp -DCUDA
-DMPI -DMPICH_IGNORE_CXX_SEEK
-I/home/user/disk2/Programs/Amber/GIT/amber/include -c gb_force.F90
gb_force.F90:244.59:

      call gbsa_ene(crd, gbsafrc, pot_ene%surf ,atm_cnt, jj, r2x, belly_atm_cnt
                                                           1
Error: Name 'jj' at (1) is an ambiguous reference to 'jj' from module
'gb_ene_hybrid_mod'
gb_force.F90:285.59:

      call gbsa_ene(crd, gbsafrc, pot_ene%surf, atm_cnt, jj, r2x, belly_atm_cnt
                                                           1
Error: Name 'jj' at (1) is an ambiguous reference to 'jj' from module
'gb_ene_hybrid_mod'
make[4]: *** [gb_force.o] Error 1
```

-Dan

On Sat, Mar 5, 2016 at 1:23 PM, Ross Walker <rosscwalker.gmail.com> wrote:
> TLDNR but key issue is mpi only is not going to scale going forward. Next gen chips will be 32 cores and after that we are talking 72 cores. So we need to take medicine sooner rather than later. Legacy scripts etc are just not going to work here so we need to move away from this ASAP before it really bites us.
> Key for Intel folk in next few days is get ability to compile without openmp option so at least we can have legacy mpi behavior.
> We will see what the code looks like at the end of the week and then I'll make a decision on if we reverse it (and have it be an update down the line) or not.
> All the bestRoss-------- Original message --------From: Jason Swails <jason.swails.gmail.com> Date: 03/05/2016 11:33 (GMT-07:00) To: AMBER Developers Mailing List <amber-developers.ambermd.org> Subject: Re: [AMBER-Developers] pmemd.MPI build broken
> On Sat, Mar 5, 2016 at 11:25 AM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
>>
>> > On Mar 5, 2016, at 06:29, David A Case <david.case.rutgers.edu> wrote:
>> >
>> > On Sat, Mar 05, 2016, Jason Swails wrote:
>> >>
>> >> Also, when I switch to using OpenMPI *without* dragonegg, the linker
>> line
>> >> still needs -lgomp to complete successfully, so the build doesn't really
>> >> work in general yet.
>> >
>> > Sounds like it's been tested mostly (only) with mpich and variants.(?)
>> It's
>> > suprising that the flavor of MPI library has an impact on the openmp
>> stuff.
>> > Maybe I'm misreading something.
>> >
>> > I've posted my gnu5 + mpich test results to the wiki page: I'm at commit
>> > 2d5d9afbc305bfbca01. Build is fine, but I see significant (non-roundoff)
>> > regressions.
>>
>> Can you try
>>
>> export OMP_NUM_THREADS=2, mpirun -np 2
>>
>> and see if you get the same errors please.
>>
>> It might be resource related - e.g. if you have 8 cores and do mpirun -np
>> 4 without setting OMP_NUM_THREADS you get 32 threads total for the GB
>> cases. (this will be addressed in doumentation shortly).
>>
>
> This is dangerous and undesirable behavior in my opinion. Adding it to
> the documentation is not a fix. For the longest time, ./configure -openmp
> was required to get OpenMP parallelism, and MPI-parallelized programs
> spawned an MPI thread for every CPU you wanted to use. This behavior has
> changed for pmemd, so now if somebody uses a script they used for Amber 14
> and earlier with Amber 16, they will get the same answers (once the
> regressions are fixed), and it will reportedly use the same number of MPI
> threads in the output, but performance will tank while they thrash their
> resources. Same thing happens if they replace "sander" by "pmemd" (which
> has always been the recommendation to get improved performance except where
> features are only supported in one or the other). UI compatibility with
> sander has always been a cornerstone of pmemd.
>
> Mixed OpenMP-MPI has its place for sure -- MICs and dedicated
> supercomputers with many cores per node. But for commodity clusters and
> single workstations, I see this as more of an obstacle than a benefit. For
> instance -- how do we parallelize on a single workstation now? I would
> naively think you would need to do
>
> mpirun -np 1 pmemd.MPI -O -i ...
>
> and let OpenMP parallelize. But no, that doesn't work, because of this in
> pmemd.F90:
>
> 176 #ifndef MPI
> 177 #ifndef CUDA
> 178 if (numtasks .lt. 2 .and. master) then
>
>
>
> 179 write(mdout, *) &
> 180 'MPI version of PMEMD must be used with 2 or more processors!'
> 181 call mexit(6, 1)
> 182 end if
> 183 #endif
> 184 #endif /*MPI*/
>
> So how do you do it? Well you can do this:
>
> export OMP_NUM_THREADS=1
> mpirun -np 16 pmemd.MPI -O -i ...
>
> Or you would need to do something like
>
> export OMP_NUM_THREADS=8
> mpirun -np 2 pmemd.MPI -O -i ...
>
> Which is better? Why? What safeguards do we have in there to avoid people
> thrashing their systems? What's the difference on a commodity cluster
> (say, parallelizing across ~4-8 nodes with a total of ~60 CPUs) between
> pmemd.MPI with and without OpenMP? I've profiled pmemd.MPI's GB scaling
> several years ago, and I was rather impressed -- despite the allgatherv
> every step, I could never hit the ceiling on scaling for a large system.
> Of course sander.MPI's GB scaling is quite good as well (not surprisingly,
> since it's really the same code). So now that we have all this added
> complexity of how to run these simulations "correctly", what's the win in
> performance?
>
> IMO, MPI/OpenMP is a specialty mix. You use it when you are trying to
> really squeeze out the maximum performance on expensive hardware -- when
> you try to tune the right mix of SMP and distributed parallelism on
> multi-core supercomputers or harness the capabilities of an Intel MIC. And
> it requires a bit of tuning and experimentation/benchmarking to get the
> right settings for your desired performance on a specific machine for a
> specific system. And for that it's all well and good. But to take
> settings that are optimized for these kinds of highly specialized
> architectures and make that the default (and *only supported*) behavior on
> *all* systems seems like a rather obvious mistake from the typical user's
> perspective.
>
> This is speculation, but based on real-world experience. A huge problem
> here is that we have never seen this code before (it simply *existing* on
> an obscure branch somewhere doesn't count -- without being in master *or*
> explicitly asking for testers nobody will touch a volatile branch they know
> nothing about). So nobody has any idea how this is going to play out in
> the wild, and there's so little time between now and release that I don't
> think we could possibly get that answer. (And the actual developer of the
> code is unqualified to accurately anticipate challenges typical users will
> face in my experience). This feels very http://bit.ly/1p7gB68 to me.
>
> The two things I think we should do are
>
> 1) Make OpenMP an optional add-in that you get when you configure with
> -openmp -mpi (or with -mic) and make it a separate executable so people
> will only run that code when they know that's precisely what they want to
> run
>
> 2) Wait to release it until a wider audience of developers have actually
> gotten a chance to use it.
>
> This is a large part of why we institute a code freeze well before release.
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Sat Mar 05 2016 - 13:00:03 PST
Custom Search