OpenMP does not come into play in PME on host and hence there is not much
change in performance,
or in the way the AMBER is run.
Running PME:
export I_MPI_PIN_MODE=pm
export I_MPI_PIN_DOMAIN=auto
mpirun -np NSOCKETS * NCORES_PER_SOCKET \
$AMBERHOME/bin/pmemd.MPI
On the other hand MPI-communication in GB workload has been reduced by
using minimal MPI ranks and
more OpenMP threads. This improves single node performance as well as
helps in scaling across nodes.
Improvement is seen in mid to large size workloads like nucleosome(~25K
atoms) and rubisco( ~75K atoms).
Try using IntelMPI + OpenMP and let us know if any issues are faced.
Running GB:
export I_MPI_PIN_MODE=pm
export I_MPI_PIN_DOMAIN=auto
mpirun -np NSOCKETS \
-env OMP_NUM_THREADS=NCORES_PER_SOCKET*2 \
-env KMP_AFFINITY="scatter,granularity=core" \
-env KMP_STACKSIZE=10M \
$AMBERHOME/bin/pmemd.MPI
- Benny
From: Jason Swails <jason.swails.gmail.com>
To: AMBER Developers Mailing List <amber-developers.ambermd.org>
Date: 03-06-2016 07:20
Subject: Re: [AMBER-Developers] AMBER Master Branch
It has something to do with OpenMP too, right? And then you have to be
careful to count total threads as MPI*OMP threads to avoid thrashing. Do
you only see the perf boost with OMP-MPI combo? That was my
understanding...
Is the exact recipe written down somewhere for how to take full advantage
of this code? Because if Dave is having trouble using it "correctly", our
users are highly unlikely to have better luck.
--
Jason M. Swails
> On Jun 2, 2016, at 8:54 PM, Ross Walker <ross.rosswalker.co.uk> wrote:
>
> Hi Dave,
>
> Performance changes here will be minimal for pre V4 (Broadwell) hardware
and most of the changes are focused on Knights Landing Xeon Phi (to be
released soon).
>
> All the best
> Ross
>
>> On Jun 2, 2016, at 16:59, David A Case <david.case.rutgers.edu> wrote:
>>
>> On Thu, Jun 02, 2016, Charles Lin wrote:
>>>
>>> So the Intel code has been in master for about a month now. We plan
on
>>> releasing the patch within a week.
>>
>> Still not sure when I am supposed to see speedups. I've tried
pmemd.MPI
>> runs (using Intel 16.0.3 compilers + MKL + mvapich2) for various
systems, up to 64
>> threads, and see no difference in speed for PME calculations. Have not
>> tried any GB calculations.
>>
>> Is this only expected to speedup things when the -intelmpi (rather than
-mpi)
>> flag is set?
>>
>> Do you have specific examples of what systems one should expect
speedups
>> for?
>>
>> ...thx...dac
>>
>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Jun 03 2016 - 04:00:02 PDT