Multiple processor minimization is very standard; if it does not work, it is
a bug!!! This is true for both sander and pmemd. But there can be subtle
differences in different runs due to rounding errors in summations of forces
over the net, etc., that can produce ever-so-slightly-different results, and
this can produce huge differences in a very few steps if the input system
has some really steep force gradients somewhere (translation - typically two
atoms too close together, though I would think other conditions might be
possible, depending on just how creative you get with leap). The "problem"
with multiple processor runs is basically that order of addition for
floating point numbers with finite precision is not truly associative; thus
it is very possible that (a + b) + c != a + (b + c). This problem is
reduced somewhat when you have only one processor - you get the same result
every time because the order of additions is the same every time (network
indeterminancy causes the change in the order of operations); also when all
calcs are done on one processor, more operations are done using extended
precision registers within the fpu for many chips, but the instant something
is actually stored to ram or cache, it must be truncated. Due to the use of
base 2 internally to represent base 10 numbers, the problem is often worse
than one would think, as inexact representation of numbers that look exact
to the programmer is rather common. So multiple processor runs are just as
correct as single processor runs, you just effectively sample phase space a
bit more (and who is to say what the truly correct physical thing is to do -
we are talking about rounding errors here way way below any real precision
we have in our parameters, numerical approximations, or the force field
model itself (coulomb's law, vdw, and all the bonded approximations). Okay,
I have carried on about this sort of thing before, but just thought it was
appropriate to bring up again, as there seems to be some element of an
unstable system here combined with fear over multiprocessor systems; the
only really new element I am mentioning here is that steep force gradients
in unstable minimizations can make things even worse. For systems with
reasonable gradients, I typically see reproducibility in single vs
multiprocessor runs out to about 300 steps (actually with my energy and
force splines ewald code, I frequently see stability out to about 400-500
steps).
Best Regards - Bob
----- Original Message -----
From: "Ilyas Yildirim" <yildirim.pas.rochester.edu>
To: <amber-developers.scripps.edu>
Sent: Wednesday, November 12, 2008 2:51 AM
Subject: Re: amber-developers: Extra Points Calculation
> Volodymyr,
>
> I have the most updated version of runmin.f file. The .diff file you sent
> me has only one change in it; committed by Case.
>
> You said that you produced the same error, right? Are you sure that we can
> use more than 2 CPUs in the amber9 minimization? Thanks.
>
> On Tue, 11 Nov 2008, Volodymyr Babin wrote:
>
>> Ilyas,
>>
>> this has been fixed in newer versions by the following commits:
>>
>> revision 9.3
>> date: 2006/08/17 23:10:58; author: case; state: Exp; lines: +1 -0
>> initialize gmin to 1 to avoid divide by zero
>>
>> ----------------------------
>> revision 9.8
>> date: 2007/10/24 22:46:00; author: steinbrt; state: Exp; lines:
>> +19 -33
>> TBS: Extended and cleaned up Softcore potentials for TI
>>
>> Please find the .diff attached (I did not test it throughly though).
>>
>> Have a great night,
>> Volodymyr
>>
>> On Tue, November 11, 2008 14:38, Ilyas Yildirim wrote:
>> > The test script is as follows:
>> >
>> > --------- runsim -------------
>> > #!/bin/csh -f
>> >
>> > set sander = $AMBERHOME/exe/sander.MPI
>> >
>> > /bin/rm -f min
>> >
>> > cat > min <<EOF
>> > Initial Minimization of solvent + ions
>> > &cntrl
>> > imin = 1,
>> > maxcyc = 10,
>> > ncyc = 5,
>> > ntb = 1,
>> > ntr = 0,
>> > cut = 8.0,
>> > ntpr = 1, ntx = 1,
>> > icfe=1,klambda=6,clambda=0.5
>> > /
>> > EOF
>> >
>> > cat > groups_md <<EOF
>> > -O -i min -p prmtop1 -c inpcrd -o min.out.p1 -r min.rst.p1 -x
>> > mdcrd_02.traj.p1
>> > -O -i min -p prmtop2 -c inpcrd -o min.out.p2 -r min.rst.p2 -x
>> > mdcrd_02.traj.p2
>> > EOF
>> >
>> > mpiexec -np 4 $sander -ng 2 -groupfile groups_md < /dev/null || goto
>> > error
>> > ------------------------------------
>> >
>> > The error message is as follows:
>> >
>> > ----
>> > [yildirim.malatya02 ~/l_0.5]# runsim
>> >
>> > Running multisander version of sander amber9
>> > Total processors = 4
>> > Number of groups = 2
>> >
>> > Looping over processors:
>> > WorldRank is the global PE rank
>> > NodeID is the local PE rank in current group
>> >
>> > Group = 0
>> > WorldRank = 0
>> > NodeID = 0
>> >
>> > Group = 1
>> > WorldRank = 2
>> > NodeID = 0
>> >
>> > WorldRank = 1
>> > NodeID = 1
>> >
>> > WorldRank = 3
>> > NodeID = 1
>> >
>> > [cli_1]: [cli_3]: aborting job:
>> > Fatal error in MPI_Bcast: Message truncated, error stack:
>> > MPI_Bcast(784).........................: MPI_Bcast(buf=0xe63888,
>> > count=1,
>> > MPI_INTEGER, root=0, comm=0x84000000) f
>> > ailed
>> > MPIR_Bcast(198)........................:
>> > MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and tag 2
>> > truncated; 144408 bytes received but buffe
>> > r size is 4
>> > aborting job:
>> > Fatal error in MPI_Bcast: Message truncated, error stack:
>> > MPI_Bcast(784).........................: MPI_Bcast(buf=0xe63888,
>> > count=1,
>> > MPI_INTEGER, root=0, comm=0x84000000) f
>> > ailed
>> > MPIR_Bcast(198)........................:
>> > MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and tag 2
>> > truncated; 144408 bytes received but buffe
>> > r size is 4
>> > rank 3 in job 1 malatya02_48011 caused collective abort of all ranks
>> > exit status of rank 3: return code 1
>> > rank 1 in job 1 malatya02_48011 caused collective abort of all ranks
>> > exit status of rank 1: return code 1
>> > error: label not found.
>> >
>> > --------------
>> >
>> > mpich2version is
>> >
>> > [yildirim.malatya02 ~/l_0.5]# mpich2version
>> > Version: 1.0.5
>> > Device: ch3:sock
>> > Configure Options: '--prefix=/programs/mpich2'
>> > 'CC=/opt/intel/cce/10.0.026/bin/icc'
>> > 'CXX=/opt/intel/cce/10.0.026/bin/icpc'
>> > CC: /opt/intel/cce/10.0.026/bin/icc
>> > CXX: /opt/intel/cce/10.0.026/bin/icpc
>> > F77: /opt/intel/fce/10.0.026/bin/ifort
>> > F90: ifort
>> >
>> > ---------------
>> >
>> > Thanks.
>> >
>> > On Tue, 11 Nov 2008, Volodymyr Babin wrote:
>> >
>> >> On Tue, November 11, 2008 14:18, Ilyas Yildirim wrote:
>> >> > While the subject has minimization in it, I have a quick question:
>> >> > Is
>> >> it
>> >> > possible to use more than 2 CPUs in Thermodynamic Integration
>> >> > minimization? I could never use 4 CPUs when I wanted to do
>> >> minimization
>> >> > with icfe set to either 1 or 2. Everything works fine with 2 CPUs,
>> >> > but
>> >> not
>> >> > with 4 CPUs in the minimization process. I use mpich2 and amber9.
>> >>
>> >> Could you provide more details on what happens with > 2 cpus?
>> >> An easily runnable test-case that shows the problem would also be
>> >> very helpful.
>> >>
>> >> Have a great day,
>> >> Volodymyr
>> >>
>> >>
>> >
>> > --
>> > Ilyas Yildirim, Ph.D.
>> > ---------------------------------------------------------------
>> > = Hutchison Hall B#10 - Department of Chemistry =
>> > = - University of Rochester =
>> > = 585-275-6766 (office) - =
>> > = http://www.pas.rochester.edu/~yildirim/ =
>> > ---------------------------------------------------------------
>> >
>> >
>>
>
> --
> Ilyas Yildirim, Ph.D.
> ---------------------------------------------------------------
> = Hutchison Hall B#10 - Department of Chemistry =
> = - University of Rochester =
> = 585-275-6766 (office) - =
> = http://www.pas.rochester.edu/~yildirim/ =
> ---------------------------------------------------------------
>
>
Received on Fri Dec 05 2008 - 14:33:25 PST