Re: amber-developers: Trouble at PSC - resolved

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 12 May 2006 11:23:12 -0400

Folks -
Forgot one other thing. DON'T use bintraj netcdf files on the xt3/xd1 until
I get a chance for a further fix. It turns out that c base i/o
read/write/open don't go through the iobuf library, so you have to modify
code. I am going to see if I can get psc to do it to their netcdf libs; we
could also do a conditional in our netcdf for the crays if necessary. As
things stand now, bintraj/netcdf can give you the worst performance
imaginable, probably because of the flush, though I am not sure (they say
cray i/o is pretty much synchronous without the iobuf lib, but what I have
observed is not perfectly consistent with that, so who knows).
Best Regards - Bob

----- Original Message -----
From: "Robert Duke" <rduke.email.unc.edu>
To: <amber-developers.scripps.edu>
Sent: Friday, May 12, 2006 11:12 AM
Subject: amber-developers: Trouble at PSC - resolved


> Folks -
> Okay, the PSC folks were pretty responsive on this one. There is an i/o
> buffering library from cray that is in beta and I have now linked pmemd
> against this. You can pick up the new code on bigben.psc.edu at
> ~rduke/amber9/exe; it is pmemd.iobuf. There is also a script there that
> sets a necessary envt variable - runjob.iobuf; you can adapt this to your
> needs. The envt variable is documented in the mail I sent to psc (below).
> PSC also did some disk reconfig which is supposed to help; they also
> modified their scratch disk scrubber to look at access rather than create
> times when getting rid of files (now, anything older than a week in terms
> of last access is susceptible) - I had complained about this after they
> blew away some of my stuff; this makes it a lot more practical to set up a
> series of runs and not have the framework wiped out. We may be able to do
> a bit better yet, but this is a start. In the mail insert below, the
> first two run times show performance under heavy disk i/o pressure from
> other users; the machine was relatively quiet during the last two runs, so
> we got baseline times with or without iobuf enabled. I'll get hacked
> versions of config to support this stuff out, along with some doc before
> long. This stuff actually applies to the cray xd1 too. Kind of
> incredible, but cray is taking care of disk i/o as an afterthought. There
> is an annoying message on the cray about a small stack limit; I need to
> disable my code that generates this too; turns out cray compute nodes
> effectively have no limits on a bunch of resources, but they don't respond
> as expected to getrlimit/setrlimit calls. Just one more little nuisance.
>
> Best Regards - Bob
>
> Embedded mail sent to John Urbanic, who was a big help in all this:
>
> John -
> Looks like the iobuf library does the trick. I did 4 runs with the iobuf
> lib enabled, and 4 runs of old code without iobuf. The setup times were
> somewhat variable, but in a real run they would not be a big issue.
> Looking
> at the nonsetup times for my nvt factor_ix benchmark, 128 proc, 1
> trajectory
> write per 250 steps (higher than we will do in the future...), 5000 step
> runs completed as follows:
>
> old code, iobuf-enabled code,
> nonsetup wallclock sec nonsetup wallclock sec
>
> 126 84
> 94 84
> 83 83 <---
> system not very busy...
> 83 83 <---
> system not very busy...
>
> This was done with a simple setting of the environment variable:
> setenv IOBUF_PARAMS '*, %stdin, %stdout'
>
> I expect we could do 1-2% better by tuning buffer sizes a bit more,
> pointing
> at specific files, etc. However, it is also important to keep this simple
> for the user.
>
> I would recommend that Marcella build a version of pmemd with iobuf
> enabled,
> and we somehow advertise the use of IOBUF_PARAMS; I will advertise this
> directly in the amber community and my group. If Marcella just wants to
> take my pmemd build, it is out under ~rduke/amber9/exe as pmemd.iobuf. I
> still need to disable the stack check to get rid of the annoying message,
> but want to test and be sure that indeed there is no stack limit even if
> you
> don't unlimit it.
>
> Thanks much for your help on all this.
>
> Best Regards - Bob
>
>
>
Received on Sun May 14 2006 - 06:07:14 PDT
Custom Search