Re: amber-developers: Compile AMBER9 on SGI from Robert Duke on 2005-11-09 (Amber Developers Archive Nov 2005)

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 9 Nov 2005 06:35:46 -0700

Folks, I'll throw in my two cents, sort of because I can't resist.
Pointers are an absolutely wonderful adjunct to your toolset as a crafter
of algorithms and data structures, allowing you to handle many situations
that you simply couldn't handle otherwise. They basically allow you to 1)
get arbitrary amounts of memory whenever you need them, and 2) create
arbitrary reference patterns that are key to many more elegant data
structures. In fortran 90, however, issue 1) is covered by dynamic
storage allocation in most situations, though not all. Issue 2) can
actually be covered rather cleanly for simpler list structures through the
use of offset arrays. So typically, one would really only need to use
pointers per se in f90/95 when one is confronted with some data structure
that actually grows after it's creation, or if one has a more complicated
reference structure - say an actual tree as opposed to a list. Generally
these scenarios are encountered when one actually has dynamic input - say
a gui interacting with a user, an operating system, or network servers for
example. We have very much canned input, and can predict reasonably well
what our needs will be once we start execution. I do have data structures
that must be able to grow in pmemd (the pairlist is a very simple
example), but implementing the pairlist as a linked list would be
ill-informed in regard to both performance and memory requirements. I
slightly overallocate it to begin with, and then keep track of the end,
and reallocate as needed. "As needed" is very infrequently - then memory
requirements typically grow slightly if system density increases under
constant pressure. Currently, in pmemd 9, the only use I have of pointers
is in creating an opaque handle to an fft object, allowing a clean
implementation of an fft interface to whatever fft implementation you
wish. As it turns out this use is two-pronged. The handle itself is a
pointer to a structure that encapsulates the data structures needed by the
fft, and because the dimension needed for some of the fields is not known
prior to creation of the object, the fields are also pointers. Thats the
only use of pointers in the code at present (and this from a guy that
lived pointers in operating systems and networking code through most of
the '90's). Now, it could be argued that even simple lists can be
represented more transparently through the use of structures and pointers
than by a combination of a data array and offset array. I agree, but find
the f90 syntax for pointers a bit of a kludge, and also I know that for
speed I will do better with the offset array and data array (the issue
here is repetitive heap allocation calls - those are never cheap, and they
also create memory reference patterns that don't cache well).

So the message here: Pointers are complicated. Pointers allow you to
create data structures that may actually use more memory than otherwise
required. Pointers allow you to create data structures and algorithms
with lower performance due to BOTH alias optimization and
striding/vectorization issues (I actually find ways to screw up
striding/vectorization without using pointers ;-)). So think a bit about
a pointer implementation before jumping on it because it looks whiz-bang.
There are most probably simpler faster alternatives in the world we live
in (excluding anyone out there rewriting the gui, and with apologies to
anyone who may actually be doing something with more unpredictable inputs
than the stuff I deal with).

Best Regards - Bob Duke

----- Original Message -----
From: Xuebin Qiao <mailto:xbqiao.gmail.com>
To: amber-developers.scripps.edu
Sent: Wednesday, November 09, 2005 6:18 AM
Subject: Re: amber-developers: Compile AMBER9 on SGI

Dear Ross:

Thanks for your comments. I agree with you that allocated pointers and
allocated arrays are same things in C/Fortran. As we all know array is
implemented via pointer by compiler behind the scene. However, what I
addressed is not this. The biggest problem when extensively use pointer in
language (not compiler) may lead to so-called "alias optimization"
problem. If you are interest, I will show you example. :-)

qxb

On 11/9/05, Ross Walker <ross.rosswalker.co.uk> wrote:

Dear Xuebin,

I disagree with the below statement at least in the most general form. I
use pointers all the time in the QMMM code and they are just as quick as
using allocatable arrays. The problem comes if they are used in a c like
fashion which is not what I was advocating.

For single dimension arrays a pointer is fine as the pointer just points
to the beginning of a single linear block of memory that you can traverse
linearly just as you would an allocatable array. Here I see no difference
between a pointer and an allocatable array.

When you move to multidimension arrays things become more complicated
since there are essentially two ways of doing things. The first, and
arguably what you should use, is the fortran way of doing things. This has
a single pointer to a linear block of memory that you can just treat as
being multidimensional by knowing the stride size of all dimensions but
the last one. This still allows linear memory traversal.

The second option is to use the 'c' style which is to allocate a 1 D
array of pointers that contains pointers to the second dimension and so
on. So arrays of pointers. This is a bad idea with regards to optimisation
as it does not guarantee linearity in memory. It also requires the lookup
of an address to get the location of an element in memory.

So, as long as we stick to the fortran approach I don't see any problem.
Somebody please correct me if you have evidence that pointers really are
worse performance wise. To date I have not come across any problems...

With regards to memory errors I always figure exactly the same thing
happens with allocatable arrays. As long as we all program in a concise
fashion - that is avoiding things like ****x etc etc., then there should
not be any problems. This said I recommend running your parts of the code
through memory checkers like valgrind on a regular basis. This can uncover
a lot of potential problems caused by array out of bounds errors etc.

Just my 2c... I don't want to get into a religious c vs fortran war,
suffice to say things can be done the wrong way in both languages...

All the best
Ross

/\
\/
|\oss Walker

| Department of Molecular Biology TPC15 |
| The Scripps Research Institute |
| Tel: +1 858 784 8889 | EMail:- ross.rosswalker.co.uk |
| http://www.rosswalker.co.uk <http://www.rosswalker.co.uk/> | PGP Key
available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not
be read every day, and should not be used for urgent or sensitive issues.

-- 
... there have been two really clean, 
consistent models of programming so far: 
the C model and the Lisp model. 
These two seem points of high ground, 
with swampy lowlands between them.
                                      --Paul Graham

Received on Wed Apr 05 2006 - 23:49:50 PDT