Re: [AMBER-Developers] infinite ptraj.MPI, was: First AmberTools release candidate

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Tue, 16 Mar 2010 22:34:55 -0400

On Tue, Mar 16, 2010 at 9:54 PM, Lachele Foley <lfoley.ccrc.uga.edu> wrote:

> Are you getting slowdown or hangs forever? For me, it never completes --
> or, at least, doesn't complete after 45 minutes on four processors.
> Compared to two seconds, that's close enough to forever for me.
>

Slowdowns only - my tests compelte. Here are some timing results over an NFS
filesystem for the first part of the ptraj_comprehensive test case (the part
that uses the ptraj.in input file):
Single processor:
-rwxr-xr-x 1 droe case 2001557 Mar 16 16:12
/home/droe/Amber/CVS/amber11/bin/ptraj
4 seconds.
Timings...

-------------------------------
| Check Input Time | 0.000 |
| Input Time | 0.000 |
| Output Time | 0.010 |
| Action Time | 4.040 |
|------------------|----------|
| Total Time | 4.050 |
-------------------------------

Pretty consistent - run takes 4 seconds, which agrees with the internal
timings.
2 processor:
-rwxr-xr-x 1 droe case 3098145 Mar 16 15:21
/home/droe/Amber/CVS/amber11/bin/ptraj.MPI
case1
time for 1 loops = 0.000419139862061 seconds
13 seconds.
Timings...

------------------------------------------
| Rank | 0 | 1 |
|------------------|----------|----------|
| Check Input Time | 0.010 | 0.009 |
| Input Time | 1.346 | 1.131 |
| Output Time | 1.013 | 1.176 |
| Action Time | 2.801 | 2.853 |
|------------------|----------|----------|
| Total Time | 5.170 | 5.169 |
------------------------------------------

-----------------------------------------------------
| | Average | Longest | Total |
|------------------|----------|----------|----------|
| Check Input Time | 0.009 | 0.010 | 0.019 |
| Input Time | 1.239 | 1.346 | 2.477 |
| Output Time | 1.094 | 1.176 | 2.188 |
| Action Time | 2.827 | 2.853 | 5.655 |
|------------------|----------|----------|----------|
| Total Time | 5.169 | 5.385 | 10.339 |
-----------------------------------------------------

Note how even though the internal timings for the multiprocessor run are
only a little slower, the actual runtime (13 s, first line) is over twice
that, which implies communication issues. Now take a look at a run on a
local disk (I'm only showing 2 processors - the timings for 1 processor are
essentially the same):
2 processors:
-rwxr-xr-x 1 droe case 3098145 Mar 16 15:21
/home/droe/Amber/CVS/amber11/bin/ptraj.MPI
case1
time for 1 loops = 0.00016713142395 seconds
2 seconds.
Timings...

------------------------------------------
| Rank | 0 | 1 |
|------------------|----------|----------|
| Check Input Time | 0.000 | 0.001 |
| Input Time | 0.158 | 0.152 |
| Output Time | 0.007 | 0.128 |
| Action Time | 2.183 | 2.069 |
|------------------|----------|----------|
| Total Time | 2.349 | 2.350 |
------------------------------------------

-----------------------------------------------------
| | Average | Longest | Total |
|------------------|----------|----------|----------|
| Check Input Time | 0.001 | 0.001 | 0.001 |
| Input Time | 0.155 | 0.158 | 0.310 |
| Output Time | 0.068 | 0.128 | 0.136 |
| Action Time | 2.126 | 2.183 | 4.252 |
|------------------|----------|----------|----------|
| Total Time | 2.350 | 2.470 | 4.699 |
-----------------------------------------------------

Now there is a speedup compared to 1 processor.


> I'm not sure if you've seen bug 126, but... We've been getting corrupted
> output files. For example, every two million characters or so, the file
> will be missing a couple. Matt is setting up tests to use the different
> mount points (file systems), and will run tomorrow.
>

I wouldn't be surprised if it did turn out to be a FS issue - even simple
NFS mount points can get really wacky sometimes (same file has different
contents/attributes on different computers etc). For now I would say avoid
using ptraj over any network filesystem in parallel.

-Dan
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Mar 16 2010 - 20:00:04 PDT
Custom Search