Re: [AMBER-Developers] Code review of pmemd.cuda

From: David Cerutti <dscerutti.gmail.com>
Date: Mon, 17 Apr 2017 21:52:00 -0400

We're getting several discussions going at once. First, Silicon Valley
looks like a promising evolution of Big Bang Theory.

Second, I get 96K on Pascal!? Yes!!!! I've been working very hard trying
to figure out how to use 48K, and of course I'll be coding for backwards
compatibility, but that's amazing if they're giving us more SMEM to work
with. The approach I'm looking at would be completely separate from the
current non-bonded inner loop, so your shared memory footprint will not
affect what I've got. That being said, I can see how you're boxed in with
regard to __shared__ and I will be too in the sense that I'll be importing
as many atoms as I can before I hit "the wall." The more atoms I can get
into __shared__ simultaneously, the better, so gemstones could indeed scale
very well going forward.

Third, I AM an emacs user like that idiot, but I'll hit some of the
libraries with vi to see what's really happening. I DO NOT intentionally
put tabs in anything. As long as the spacing changes get taken care of
now, in quick succession with the rest of the revision, it will not make
commit diffs any harder to view. There will be one giant change with a
clear understanding of what happened, and any commits thereafter will be
easy to interpret.

Finally, the lookup table: if texture memory is unreliable, I can fit this
in __shared__. (That's what I had initially planned to do before I learned
what texture memory was). For a 10A cutoff, I'm looking at one table that
is 800 bytes and another that is 1224 bytes, so less than 2kb of lookup
tables overall. The first lookup table grows as the square of the cutoff
distance (not good--but it's still only 4k with a very large 16A cutoff),
the second (which is where the coefficients really reside) only grows as
the log of the squared cutoff distance (so the second table is still only
1.7kb for a 16A cutoff). You multiply r2 by 8.0, then do a float ==> int
conversion to get the index to read from the first lookup table, and that
provides you an 8-bit int which tells you the place to look in the second
lookup table, which provides six coefficients for a rational function of r2
that approximates the derivative of erfc(r)/r. Accuracy to within a few
parts per billion out to 6A, and thereafter the *relative* accuracy falls
merely because the value of erfc(r)/r is nearly zero already.

Dave


On Mon, Apr 17, 2017 at 9:14 PM, David Case <david.case.rutgers.edu> wrote:

> On Mon, Apr 17, 2017, Ruxi Qi wrote:
>
> >
> > FYI, trailing whitespace can be easily eliminated by the sed in-place
> > substitution as follows, say to process all .F90 files in batch, do:
> >
> > sed -i "s/\s\+$//g" *.F90
>
> True, but let's not do this just for the sake of doing it, or because
> some integration script or compiler complains about trailing whitespace.
> Large-scale changes like this can make it impossible to track true code
> changes, and for really minimal gain.
>
> Even with Dave's changes, some which were made to satisfy a rather
> arbitrary
> 96-character line limit, have to balance the pain of having some
> long lines with the pain of not being able to compare code to earlier
> versions.
>
> Establishing consistent indentation at the beginning of lines (to make
> code flow and logic clearer) is probably worth the pain in most cases.
> Removing trailing whitespace is often not worth the pain.
>
> ....dac
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Apr 17 2017 - 19:00:02 PDT
Custom Search