Yes, I talked with Bob a good deal about the splining he does and I think
I do an equivalent spline (I may have things set even more
aggressively--the PMEMD spline table may be larger than mine with finer
discretization, but that would only work in my favor if it were a speed
issue). I have two modes, as I think Bob made, specifiable in the &cntrl
or perhaps the &ewald namelist if I recall, which permit the user to
toggle separate force and energy splines or to compute forces from the
derivative of a single energy spline. In the latter case, the inner loop
does not actually branch, I think it just fills the same spline with the
derivative values and a zero. I forget exactly what it's doing but
there's certainly no branch in the inner loop about this. Reducing the
spline order to quadratic, which is essentially what the latter method
does, is pretty gruesome on the numerics vis-a-vis the actual energy
you're trying to compute, but may get you better conservation of a less
accurate measure of the energy since your force is strictly the derivative
of your smoothly varying potential energy function. The default is to
just do separate force and energy lookups.
I use valgrind all the time, but not the cachegrind feature so much.
Perhaps the next generation of processors will fix my problems.
Dave
> Hi,
>
> Ok, i agree that youve tried to layout the data structures in
> memory smart ways (although they could still be unoptimal).
>
> Yes the experiment i suggest must be head-to-head on only equivalent
> codes.
> Note that hardware counter profiling gives information on L1 cache
> misses, prefetching, issued loads, branch prediction, etc.
> Thus, this is much different that gprof profiling.
> The idea would be to compare counter outputs, find a counter that is
> sig. different between the codes, and read the codes to look for
> an obvious cause. Getting up to speed on hardware profiling
> is already more than a day's work. And in practice this simple
> scenario may not materialize, eg, for charmm and amber i did this,
> but there were tooo many differences to untangle a simple cause
> and effect. Sadly, i don't have time to volunteer.
>
> One minor point on restrict that you probably already know;
> for a zeroth order test to determine whether restrict could help,
> be very generous with restrict (at the expense of code correctness);
> it is sometimes very easy to miss a dependency that is preventing
> an optimization. Maybe there are tools available to help, but Im
> too stale to know.
>
> Ohh yeah, valgrind has a cache profiler. I havent used it, but
> valgrind is great software, so it might be a quick way to enter
> the hardware profiler waters.
>
> scott
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Aug 29 2011 - 21:00:02 PDT