Re: [AMBER-Developers] AMBER Master Branch

From: Benny1 M <>
Date: Fri, 3 Jun 2016 16:00:19 +0530

OpenMP does not come into play in PME on host and hence there is not much
change in performance,
or in the way the AMBER is run.

Running PME:
        export I_MPI_PIN_MODE=pm
        export I_MPI_PIN_DOMAIN=auto
        mpirun -np NSOCKETS * NCORES_PER_SOCKET \

On the other hand MPI-communication in GB workload has been reduced by
using minimal MPI ranks and
more OpenMP threads. This improves single node performance as well as
helps in scaling across nodes.
Improvement is seen in mid to large size workloads like nucleosome(~25K
atoms) and rubisco( ~75K atoms).
Try using IntelMPI + OpenMP and let us know if any issues are faced.

Running GB:
        export I_MPI_PIN_MODE=pm
        export I_MPI_PIN_DOMAIN=auto
                mpirun -np NSOCKETS \
        -env KMP_AFFINITY="scatter,granularity=core" \
        -env KMP_STACKSIZE=10M \

- Benny

From: Jason Swails <>
To: AMBER Developers Mailing List <>
Date: 03-06-2016 07:20
Subject: Re: [AMBER-Developers] AMBER Master Branch

It has something to do with OpenMP too, right? And then you have to be
careful to count total threads as MPI*OMP threads to avoid thrashing. Do
you only see the perf boost with OMP-MPI combo? That was my

Is the exact recipe written down somewhere for how to take full advantage
of this code? Because if Dave is having trouble using it "correctly", our
users are highly unlikely to have better luck.

Jason M. Swails
> On Jun 2, 2016, at 8:54 PM, Ross Walker <> wrote:
> Hi Dave,
> Performance changes here will be minimal for pre V4 (Broadwell) hardware 
and most of the changes are focused on Knights Landing Xeon Phi (to be 
released soon). 
> All the best
> Ross
>> On Jun 2, 2016, at 16:59, David A Case <> wrote:
>> On Thu, Jun 02, 2016, Charles Lin wrote:
>>> So the Intel code has been in master for about a month now.  We plan 
>>> releasing the patch within a week.
>> Still not sure when I am supposed to see speedups.  I've tried 
>> runs (using Intel 16.0.3 compilers + MKL + mvapich2) for various 
systems, up to 64
>> threads, and see no difference in speed for PME calculations.  Have not
>> tried any GB calculations.
>> Is this only expected to speedup things when the -intelmpi (rather than 
>> flag is set?
>> Do you have specific examples of what systems one should expect 
>> for?
>> ...thx...dac
>> _______________________________________________
>> AMBER-Developers mailing list
> _______________________________________________
> AMBER-Developers mailing list
AMBER-Developers mailing list
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you
AMBER-Developers mailing list
Received on Fri Jun 03 2016 - 04:00:02 PDT
Custom Search