Re: [AMBER-Developers] infinite ptraj.MPI, was: First AmberTools release candidate

From: Robert Duke <rduke.email.unc.edu>
Date: Wed, 17 Mar 2010 09:35:55 -0400

Hi Lachele,
Sorry, I have not been following all this in detail, but there really should
be no problem with the code itself at the level of pmemd or sander. The
ptraj stuff I don't have a clue about, as I have not looked at it. I still
strongly suspect some system component, be it the system/filesystem itself,
mpi, or the compiler. If you get problems on uniprocessor runs (I don't
remember), well that rules out mpi. If disk access is via nfs, then all
bets are off at various levels. You either need local disk or a serious
parallel file system that you can access via a good interface (so you have
lustre I believe, but I don't know if there is something funky about how it
is connected); I have seen nfs cause serious problems, especially when it
either is on a slow ethernet interface or if it shares the interconnect used
by mpi. It is also possible to configure nfs in ways where the sync between
the program and the disk is looser, and I could envision that causing
problems (I am rusty on nfs, but I have seen funky things occur before
without a lot of effort). On ptraj hanging, well, there are multiple ways
to make this sort of thing happen, but two common ones have to do with 1)
how you configure mpi buffers on your system (buffers too small on the
system and too big in the code can cause really interesting buffer
allocation race conditions which result in hangs; simple mpi has to be
configured for buffer size, and the operating system itself has to also be
configured), and 2) how you synchronize mpi activity in the code. I think
there is the potential for a lot of work to solve the problems you are
having, but I could be overestimating it. I WOULD NOT add netcdf to the
mix at this point in time, or mkl for that matter. The more complex layers
you pile onto this mess, the harder it is going to be to fix. Well, that
was all probably no help at all...
Regards - Bob
----- Original Message -----
From: "Lachele Foley" <lfoley.ccrc.uga.edu>
To: "AMBER Developers Mailing List" <amber-developers.ambermd.org>
Sent: Wednesday, March 17, 2010 9:10 AM
Subject: Re: [AMBER-Developers] infinite ptraj.MPI, was: First AmberTools
release candidate


> ??? Neither sander nor pmemd do parallel i/o, as far as I can see.

Exacty... That was one of my first questions to Bob D: do the file writes
do anything fancy like leave pointers hanging mid-file, etc.? He said "just
dumb-bunny appends," which is what makes sense to do.

> Does your problem exist with netcdf trajectories?

Haven't tried netcdf. I guess I have a job now for the grad student who
just offered to help... I doubt it will matter a lot, because I've also
seen issues, for example, in my min.o file. But, all information helps.

Regarding Ross's comment, since the filesystem I had the problem on is
Lustre, "a real parallel filesystem", then the ptraj.MPI should be good
there (better, even?), not hung forever. Right?


:-) Lachele
--
B. Lachele Foley, PhD '92,'02
Assistant Research Scientist
Complex Carbohydrate Research Center, UGA
706-542-0263
lfoley.ccrc.uga.edu
----- Original Message -----
From: case
[mailto:case.biomaps.rutgers.edu]
To: AMBER Developers Mailing List
[mailto:amber-developers.ambermd.org]
Sent: Wed, 17 Mar 2010 07:43:57
-0400
Subject: Re: [AMBER-Developers] infinite ptraj.MPI, was: First
AmberTools release candidate
> On Tue, Mar 16, 2010, Lachele Foley wrote:
> >
> > > I wouldn't be surprised if it did turn out to be a FS issue - even
> simple
> > > NFS mount points can get really wacky sometimes (same file has 
> > > different
> > > contents/attributes on different computers etc). For now I would say
> avoid
> > > using ptraj over any network filesystem in parallel.
> >
> > ...or sander, pmemd...
>
> ??? Neither sander nor pmemd do parallel i/o, as far as I can see.
>
> Does your problem exist with netcdf trajectories?
>
> ...dac
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Wed Mar 17 2010 - 07:00:03 PDT
Custom Search