Re: amber-developers: Troubles at PSC

From: Robert Duke <rduke.email.unc.edu>
Date: Fri, 5 May 2006 09:48:54 -0400

Okay guys, thanks much for the input. I think, based on that input, that a
very reasonable benchmark mdcrd output frequency is 1 snapshot per psec, and
will modify my benchmarks accordingly. I know this is the high end of the
1-2 psec range, but I typically want to be on the high end when benchmarking
something like this - my goals are to set user expectations and highlight
problems with particular architectures as much as anything else (and you
thought I just wanted to get the highest nsec/day that I could...;-)).
Okay, more seriously, I know I came up with this 0.25 psec metric from
somewhere, but can't find the mail. I thought maybe it was in a
conversation with Yong, but I actually found a nice email from Yong in the
Oct 2003 timeframe pointing at the 1-2 psec range with a justification (good
stuff - justifications):

> It really depends on the events you are interested in.
> 50, 100, 150, 200 steps are a bit excessive.
> I typically save snapshots at ~2ps intervals for 1-10ns simulations.

> Here is the reasoning.
> Water rotation takes roughly ~1-2ps (at 300K). Unless, you are
> interested in monitoring how a few water molecules rotate during the
> simulation, ~1-2ps intervals should give you enough detail.
> For long simulations (>50ns), I save the trajectories at 10-20ps
> intervals.

> yong

So a couple of comments. Maybe we should give some guidelines in the user
manual about things like this? From the perspective of a guy who primarily
has a mission to make things fast, it is really helpful for me to know what
the real requirements of the applications are. Just because the user can
dump a coordinate frame every step, that does not become a real requirement;
it is an unreasonable use of the software in most instances. When users
don't have beginning guidelines on these things, they are apt to do dumb
things. They will either take snapshots too infrequently, and will have
wasted the entire run, or take them too frequently, and waste disk space and
some portion of their computer time. If we can post reasonable guidelines,
we can help the beginning user use our software more effectively. You guys
have the experience to make the recommendations about applications
requirements; I have the experience about how to get a computer to do a
given task in minimum time; let's combine our experiences for everyone's
benefit. To my mind there are two ways to more efficiently solve a problem
on a computer: 1) choose a better algorithm, and 2) do the work that is
actually necessary to solve the problem.

Okay, next there is disk performance at PSC. This is a hot-button issue
with me because I have been living with a generally unhelpful attitude about
system performance at PSC for a long time now. When I sent out the first
mail about this problem, I got comments back from Ross indicating that there
are users out there in the HPC world who specifically avoid, or attempt to
avoid getting allocations at PSC because they are known to run systems that
do an inadequate job of disk i/o. Now, you, as a user, won't typically see
problems of this sort unless you do two things: 1) do a run that pushes the
boundary on disk i/o, and 2) happen to be running when one or more other
users are also pushing those boundaries. You also may not really notice it
if you do long runs and you don't have a really good sense of how long they
should take anyway (long runs will help if the disk overloads are sporadic -
you will only be impacted part of the time). Well, because I am
benchmarking all over the place all the time, I know what to expect. In the
case of this psc system, it is currently (or at least 2 days ago) performing
well below expectations, having serious disk stalls impacting as few as four
processes doing 0.25 psec coordinate dumps (that would be coordinate dumps
at a frequency equivalent to 16 processes doing 1 psec coordinate dumps -
but 100K atoms, so that factors in too). The best way to spot this
particular problem is look at the pmemd logfile (though a long setup time is
also a problem indicator - at the bottom of mdout). What you will see in
the logfile is the master process spending huge amounts of time in runmd
while everyone else is spending the equivalent time spinlocking on mpi i/o,
waiting for the master to get done writing the bloody files. How do you fix
the problem? Well you don't fix it by not bringing the issue up with the
guys that spend our tax dollars running places like PSC.

Best Regards - Bob

----- Original Message -----
From: "Yong Duan" <duan.ucdavis.edu>
To: <amber-developers.scripps.edu>
Sent: Thursday, May 04, 2006 11:03 PM
Subject: RE: amber-developers: Troubles at PSC


>
> Me 3rd.
> In those early days when ~100ps was the limit, I used to save snapshots at
> 0.25-0.5 ps. But that was in early 90's. In those early days, one ps was a
> big deal that took hours of CPU time to run. I wonder how many now save
> the
> trajectories that often. We now save the trajectories at typically ~1-5ps
> for GB and ~10-20 ps for PME simulation.
>
> yong
>
>> -----Original Message-----
>> From: owner-amber-developers.scripps.edu
>> [mailto:owner-amber-developers.scripps.edu] On Behalf Of
>> Carlos Simmerling
>> Sent: Thursday, May 04, 2006 6:58 PM
>> To: amber-developers.scripps.edu
>> Subject: Re: amber-developers: Troubles at PSC
>>
>>
>> I haven't written more often than each ps in years except like
>> Dave says for special purposes. Lately we often write at 10ps
>> intervals
>> for long simulations since the longer the runs are the less we are
>> usually interested in ps resolution. The structures just aren't very
>> different,
>> though that is useful when making movies to show the dynamics.
>> carlos
>>
>
>
Received on Sun May 07 2006 - 06:07:05 PDT
Custom Search