amber-developers: Trouble at PSC - resolved

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 12 May 2006 11:12:40 -0400

Folks -
Okay, the PSC folks were pretty responsive on this one. There is an i/o
buffering library from cray that is in beta and I have now linked pmemd
against this. You can pick up the new code on bigben.psc.edu at
~rduke/amber9/exe; it is pmemd.iobuf. There is also a script there that
sets a necessary envt variable - runjob.iobuf; you can adapt this to your
needs. The envt variable is documented in the mail I sent to psc (below).
PSC also did some disk reconfig which is supposed to help; they also
modified their scratch disk scrubber to look at access rather than create
times when getting rid of files (now, anything older than a week in terms of
last access is susceptible) - I had complained about this after they blew
away some of my stuff; this makes it a lot more practical to set up a series
of runs and not have the framework wiped out. We may be able to do a bit
better yet, but this is a start. In the mail insert below, the first two
run times show performance under heavy disk i/o pressure from other users;
the machine was relatively quiet during the last two runs, so we got
baseline times with or without iobuf enabled. I'll get hacked versions of
config to support this stuff out, along with some doc before long. This
stuff actually applies to the cray xd1 too. Kind of incredible, but cray is
taking care of disk i/o as an afterthought. There is an annoying message on
the cray about a small stack limit; I need to disable my code that generates
this too; turns out cray compute nodes effectively have no limits on a bunch
of resources, but they don't respond as expected to getrlimit/setrlimit
calls. Just one more little nuisance.

Best Regards - Bob

Embedded mail sent to John Urbanic, who was a big help in all this:

John -
Looks like the iobuf library does the trick. I did 4 runs with the iobuf
lib enabled, and 4 runs of old code without iobuf. The setup times were
somewhat variable, but in a real run they would not be a big issue. Looking
at the nonsetup times for my nvt factor_ix benchmark, 128 proc, 1 trajectory
write per 250 steps (higher than we will do in the future...), 5000 step
runs completed as follows:

old code, iobuf-enabled code,
nonsetup wallclock sec nonsetup wallclock sec

126 84
 94 84
 83 83 <--- system
not very busy...
 83 83 <--- system
not very busy...

This was done with a simple setting of the environment variable:
setenv IOBUF_PARAMS '*, %stdin, %stdout'

I expect we could do 1-2% better by tuning buffer sizes a bit more, pointing
at specific files, etc. However, it is also important to keep this simple
for the user.

I would recommend that Marcella build a version of pmemd with iobuf enabled,
and we somehow advertise the use of IOBUF_PARAMS; I will advertise this
directly in the amber community and my group. If Marcella just wants to
take my pmemd build, it is out under ~rduke/amber9/exe as pmemd.iobuf. I
still need to disable the stack check to get rid of the annoying message,
but want to test and be sure that indeed there is no stack limit even if you
don't unlimit it.

Thanks much for your help on all this.

Best Regards - Bob
Received on Sun May 14 2006 - 06:07:14 PDT
Custom Search