Re: [AMBER-Developers] numprocs test and hpmpi from Jason Swails on 2011-02-08 (Amber Developers Archive Feb 2011)

From: Jason Swails <jason.swails.gmail.com>
Date: Tue, 8 Feb 2011 14:20:16 -0500

On Tue, Feb 8, 2011 at 1:36 PM, B. Lachele Foley <lfoley.uga.edu> wrote:

> The computer in Ireland is now behaving well enough that we're starting to
> put AMBER on it again.
>
> There is an issue with HP's MPI and how the tests decide the number of
> processors. Best I can tell, hpmpi's devels didn't consider that someone
> might want to run multiple instances of a serial job, such as "echo" or
> "sleep". But, if you know of another reason, please say.
>
> Here's what the hpmpi version of mpirun does:
>
> [lachele.yeats ~]$ mpirun -np 2 echo "hello"
> hello
> MPI Application rank 1 exited before MPI_Init() with status 0
> mpirun: Broken pipe
>
> ...on openMPI, of course, the behavior is as expected:
>
> [lachele.eliot ~]$ mpirun -np 2 echo "hello"
> hello
> hello
>
> Are there objections to using a mini, parallel variant of the test instead
> of the current one? Currently, the tests use:
>
> numprocs=`${DO_PARALLEL} echo "Testing number of processors" | wc -l`
>
> I'm thinking:
>
> numprocs=`${DO_PARALLEL} numproces_test | wc -l`
>
> ...where numprocs_test would give equivalent output. Would that be likely
> to break anything else? I've never written a program in parallel, so this
> might be a good one for me to start with. :-) I shouldn't volunteer for
> anything for another week or three, but this seems like a low-priority thing
> to tackle.
>

I think that this is a good idea. Writing this in a program catches more
errors that the simple script would miss. For example, if you use an
OpenMPI "mpirun" on a program compiled with MPICH2 (or maybe reverse that),
then you'll basically get independent threads running that don't know how to
communicate, so the size of each global communicator will just be 1. This
error won't be caught here.

My proposal is to just write a little C program that creates a communicator
then prints out the size of that communicator to stdout on the master
thread. This is easily parsed as in:

numprocs=`${DO_PARALLEL} size_test`

Then this can be tested in a conditional loop (and make sure it's -gt 1).
This is a much better test than what we have going right now (and we could
also build this little test program along with the parallel test suite in
Amber(Tools) so we don't have to worry about it at test time. Most
importantly, it'll make sure that our MPI is actually set up properly and
head off some errors before it makes you think that half of the parallel
tests hit a brick wall.

It would be a fairly simple program to write, and is effectively a good
"Hello world" MPI program that actually has some use! I think it adds
nothing to the weight of the current repo and is a good addition to our test
suite. Happy coding.

Thanks!
Jason

> :-) Lachele
>
> Dr. B. Lachele Foley
> Complex Carbohydrate Research Center
> The University of Georgia
> Athens, GA USA
> lfoley.uga.edu
> http://glycam.ccrc.uga.edu
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>

-- 
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Graduate Student
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Tue Feb 08 2011 - 11:30:04 PST