You guys realize that there is no short circuit evaluation in Fortran
right?
So
if (i == 1 .and. j == 1) then
will always be slower than
if (i == 1) then
if (j ==1 ) then
And arranging this such that the most likely false evaluation is first is
optimal.
The point here is not so much the emil case but the fact that the ti if
statement is there in the first place. Adding a third expression just
makes it worse.
And that fact that these are evaluations of an integer array is even worse
since the compiler can't factor it out of the loop here.
And why is there arithmetic inside the if statement! - These things should
at the absolute least be booleans!!!
/fail. :-(
On 2/13/14, 12:25 AM, "Josh Berryman" <the.real.josh.berryman.gmail.com>
wrote:
>>>Sure, CPUs are really good at this sort of thing, and outside the inner
>>>loops I don't care about them. But here, I do.
>
>Absolutely, every time there is an if or a divide inside a loop over N
>a baby seal dies.
>
>I prevailed on Joe to add the emil_sc feature for me as someone
>mentioned that he had his fingers in the ti code already. I find it
>immensely useful and like it a lot, but yes, ideally there would be a
>longer "soak" period where I have a look at the changes made (in parts
>of the code which I would not really have been confident myself to
>start the ball rolling in), explore performance and look for
>optimisations.
>
>I know that it is getting late in the day but it seems to me that
>there is a really obvious fix for this particular extra cost, so no
>need to mess around with compile flags in the name of performance.
>
>If instead of 0 or 1 as the ti_sc_lst flags we use 0 or 2 (or 1 for
>emil_sc) then
>the test goes from:
>
>if ((ti_sc_lst(real_i)+ti_sc_lst(real_j) .eq. 1) .or. &
> (emil_sc_lcl .eq. 1 .and.
>ti_sc_lst(real_i)+ti_sc_lst(real_j) .ge. 1))
>
>to:
>
>if (ti_sc_lst(real_i)+ti_sc_lst(real_j) .eq. 2)
>
>which is back to the efficiency that we had before the change. I am
>happy to implement this and commit, after all the code freeze is
>tomorrow, not today, right?
>
>Josh
>
>
>
>
>
>
>
>
>
>
>
>The smallest change to give an efficiency
>
>
>On 13/02/2014, Scott Le Grand <varelse2005.gmail.com> wrote:
>> David,
>> When I start seeing global switch tests in inner loops:
>>
>> else if ((ti_sc_lst(real_i)+ti_sc_lst(real_j) .eq. 1) .or. &
>> *(emil_sc_lcl .eq. *1 .and.
>> ti_sc_lst(real_i)+ti_sc_lst(real_j) .ge. 1)) then
>> if (ti_lst(1,real_i)+ti_lst(1,real_j) .ge. 1) then !TI region
>>1
>>
>> this has always been a reliable warning sign for me in the past that the
>> overall structure of the code needs reorganization.
>>
>> The GPU code does this sort of thing at compile time and that's where
>> decisions like this belong IMO. And it's not any one if/then that is
>>the
>> problem but rather that these guys tend to reproduce like bunnies once
>>they
>> appear.
>>
>> Sure, CPUs are really good at this sort of thing, and outside the inner
>> loops I don't care about them. But here, I do.
>>
>> OTOH the more there are, the more I can refactor them out of GPU
>> incarnations and that's free performance beer.
>>
>> That and I'm on the verge of rethinking the GPU code itself because so
>>many
>> assumptions that were valid in 2010 no longer hold.
>>
>> Scott
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 12, 2014 at 6:46 PM, David A Case
>> <case.biomaps.rutgers.edu>wrote:
>>
>>> On Wed, Feb 12, 2014, Jason Swails wrote:
>>> > >
>>> > > What's left in Sander that people crave for in PMEMD?
>>> > >
>>> >
>>> > QM/MM.
>>>
>>> Certainly, QM/MM is in sander but not pmemd. But do people "crave for
>>> it"
>>> to be added to pmemd? The usual argument is that the QM part quickly
>>> becomes
>>> the time bottleneck, so the fact that MM is faster in pmemd becomes
>>> irrelevant -- there is little need (yet) to think about porting QM/MM
>>>to
>>> pmemd. (I could be wrong here.)
>>>
>>> Similar arguments apply for PBSA and RISM: these are both in sander,
>>>but
>>> are slow, and limit performance. Again, any speed advantage PMEMD has
>>> for
>>> the other parts of the calculation are minimal. So there is no hurry
>>>to
>>> put PBSA or RISM into pmemd.
>>>
>>> The emap functionality is similar, but maybe not identical, since it is
>>> not
>>> as big a time-hog as QM, PBSA or RISM. So far, I don't think most emap
>>> calculations (and there aren't very many of them, since only a few
>>>people
>>> are using this capability) need highly efficient MM. But there is more
>>> of
>>> an argument here that emap capability in pmemd might be desirable.
>>>
>>> On the other hand, I don't really see Scott's argument, that emap (et
>>>al)
>>> somehow make it hard to keep going on pmemd.cuda development. It's
>>>quite
>>> modular, and there is a simple if-statement in mdin to disable this for
>>> GPUs.
>>> We've always had a list of things that aren't supported for GPUs;
>>>having
>>> a
>>> few more entires in that list doesn't seem that bad (to me). We're
>>> learning
>>> how to document this better, both in the code and in the Manual.
>>>
>>> Scott says he wants "more control ... on feature additions that will
>>>run
>>> on the GPU." We *should* establish a better mechanism for suggesting
>>>and
>>> vetting requests for new GPU features. But I don't (yet) see that
>>> another
>>> code fork is needed to accomplish this.
>>>
>>> ...dac
>>>
>>>
>>> _______________________________________________
>>> AMBER-Developers mailing list
>>> AMBER-Developers.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>
>> _______________________________________________
>> AMBER-Developers mailing list
>> AMBER-Developers.ambermd.org
>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>
>
>_______________________________________________
>AMBER-Developers mailing list
>AMBER-Developers.ambermd.org
>http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Feb 13 2014 - 08:30:02 PST