On Tue, Jul 24, 2012 at 8:15 AM, <steinbrt.rci.rutgers.edu> wrote:
> Hi all,
>
> I have been trying to track down strange SHAKE failures in TI that have
> been reported occasionally on the mailing list (Thanks Jodi, Ross and
> Clara), but always had trouble reproducing them. I now found out a bit
> more about these:
>
> (explanation of the problem follows, my question is below)
>
> The problem is a desynchronization of coordinates on V0 and V1, and the
> hardcoded dac/knut sync every 20 steps does not prevent them. It becomes
> tricky because this problem occurs only when Amber is compiled with the
> Intel Compilers but not with gcc. I could track it down to a point in
> subroutine sc_pscale in softcore.F90 where we have
>
> x(1,i) = x(1,i) + xmolnu - xmol
>
> This is rigid translation of atom coordinates for pressure scaling, xmolnu
> and xmol are the old and new box dimensions. The two numbers are quite
> similar, so the subtraction involves loss of precision of about five
> digits. That is not a problem per se, but this loss of precision results
> in arbitrary lowest digits when the subtraction is done in intel-compiled
> code.
>
> e.g.
>
> xmolnu 40.3614616595491
> -xmol -40.3635833689616
> should be
> = -0.0021217094125
> but is
> = -0.002121709412506334 on process V0
> and
> = -0.002121709412520545 on process V1
>
> Since the x coordinates are smaller than the box size, this gives roundoff
> errors and results in SHAKE failures in some rare cases.
>
>
> So now my question: What to do about it? Broadcasting the coordinates
> after each step would be quite a performance hit for TI runs. An
> alternative would be to artificially lose more digits:
>
> tmp = float ( int((xmolnu - xmol)*1.0d6))/1.0d6
>
> but that is hardly an elegant solution. The easiest alternative is to tell
> people to avoid the Intel Compilers when doing TI, also not really a good
> fix.
>
> Any ideas on how to deal with this situation?
>
What about forcing synchronization with a bcast from commmaster's master?
It should be fairly cheap given that the master communicator is only 2
threads.
Another option is to define an 'extra' precision real variable to store the
difference, which should eliminate the arbitrariness of the trailing
decimals. You can use the 'kind' specifier, which is part of the Fortran
95 spec:
real(kind=DIFF_KIND) :: small_diff = 0.0
where you define DIFF_KIND using "selected_real_kind" (
http://gcc.gnu.org/onlinedocs/gcc-4.7.1/gfortran/SELECTED_005fREAL_005fKIND.html)
to obtain the desired precision. My suggestion is to use this locally and
avoid widespread use where possible, since 'roll-your-own' precisions are
risky, but I think warranted in this case.
Just thoughts.
Good luck,
Jason
--
Jason M. Swails
Quantum Theory Project,
University of Florida
Ph.D. Candidate
352-392-4032
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Tue Jul 24 2012 - 05:30:03 PDT