- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: David Cerutti <dscerutti.gmail.com>

Date: Tue, 16 May 2017 22:10:05 -0400

There's been a spirited discussion among members of Castle Case and Ross

Walker's merry men on the subject of precision in SPFP mode. This is not a

discussion of how much precision we absolutely need--that's for GROMACS and

D. E. Shaw--rather, it's a discussion of how much precision we currently

have, what we can get away with before making that situation any worse, and

what we should be steering our users to take.

Here are some useful metrics that I've deduced. These numbers are for the

accumulated force on each atom, averaged over all atoms, in the JAC

benchmark. It runs in a 64A box, 9A cutoff, dsum_tol = 1.0e-5, 64 point

cubic mesh. I used the mdgx program to run the PME calculations in full

double precision and print the forces. I modeled a reduction to 32-bit

coordinates by perturbing atoms by up to 1.0e-6 A in the inpcrd file

(precision to 1.0e-7) and recomputing the numbers. (In SP modes pmemd.cuda

can represent positions to a precision of 1/1048576 A, a number which won't

change unless the cutoff gets needlessly large or unadvisably small.) The

error due to the coordinate representation is a sum of non-bonded forces

(the bonded interactions are done in double precision even in SPFP mode).

All results are in kcal/mol-A.

Overall error due to 32-bit coords : 0.0000095

Error in reciprocal space, default settings : 0.0032053

Error in direct space electrostatics, : 0.0031423

Error due to 9A LJ truncation : 0.0008458

There's been a motion to increase our default settings to dsum_tol 1.0e-6.

This reduces direct space error by a factor of roughly 10, but it increases

reciprocal space error unless something is done to improve the precision

there (i.e. more grid density, or a longer direct space cutoff, as higher

interpolation orders aren't in the cards for now). I think that the

coordinate representation is a good metric to bear in mind: 1.0e-5

kcal/mol-A... how many times greater are our other sources of error?

If we go down all the way to dsum_tol = 1.0e-6 and match that in the

reciprocal space part, we'll be matching the precision of the LJ part for a

~11A cutoff. I'd propose we urge people to use dsum_tol = 4.0e-6 with a

10A cutoff. This will give us the following errors (understand, the direct

space elec/LJ errors will be consistent across different simulations, and

the reciprocal space error in the JAC benchmark is close to a maximum

because the 62A cubic box produces grid spacings very near 1A, the maximum

that Amber's default grid sizer will accept).

Error due to dsumtol = 4.0e-6 : 0.0013631 kcal/mol-A

Error due to 10A LJ truncation : 0.0005210 kcal/mol-A

Error in reciprocal space : 0.0012215 kcal/mol-A

Another thing that we can do with an understanding of the coordinate

representation is tailor the representation of forces as we accumulate them

in the non-bonded direct space sum. It would be very helpful to those of

us programming pmemd.cuda if we could keep things in a 32-bit

representation until we're ready to commit. I'm building a version of

pmemd.cuda that accumulates up to 32 direct space forces (all results for

one thread in one tile) with 20 bits of precision after the decimal, then

recasts the 32-bit accumulators to 64-bit and scales the numbers by another

2^20 to put them in the same units as the other forces. It boosts

performance a few percent and saves a lot of __shared__ memory which can be

put to other uses. I've estimated the error due to truncating individual

forces at 20 bits after the decimal (0.000001 kcal/mol-A) and accumulating

~180 of them per atom (about what happens with a 10A cutoff): 7.2e-6

kcal/mol-A error, or about the same as the error due to the coordinate

representation. In this format I can handle forces up to 2000 kcal/mol-A.

I'd conclude this by saying "I don't think that any simulation that's

stable at all is going to break that," but it sounds too ominous and will

no doubt invite comments of "famous last words..." If really necessary, I'm

pretty confident that I can push to 19 bits of precision past the decimal

(force truncation is then getting us slightly more error than the

coordinate rounding, but still way way below any of the other sources), and

let the accumulators take up to 4000 kcal/mol-A forces. About one in a

million individual forces gets above 64 kcal/mol-A, and it becomes

exponentially less likely to get forces of larger and larger sizes the

higher up you go.

If anyone else has thoughts, please share.

Dave

_______________________________________________

AMBER-Developers mailing list

AMBER-Developers.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Tue May 16 2017 - 19:30:03 PDT

Date: Tue, 16 May 2017 22:10:05 -0400

There's been a spirited discussion among members of Castle Case and Ross

Walker's merry men on the subject of precision in SPFP mode. This is not a

discussion of how much precision we absolutely need--that's for GROMACS and

D. E. Shaw--rather, it's a discussion of how much precision we currently

have, what we can get away with before making that situation any worse, and

what we should be steering our users to take.

Here are some useful metrics that I've deduced. These numbers are for the

accumulated force on each atom, averaged over all atoms, in the JAC

benchmark. It runs in a 64A box, 9A cutoff, dsum_tol = 1.0e-5, 64 point

cubic mesh. I used the mdgx program to run the PME calculations in full

double precision and print the forces. I modeled a reduction to 32-bit

coordinates by perturbing atoms by up to 1.0e-6 A in the inpcrd file

(precision to 1.0e-7) and recomputing the numbers. (In SP modes pmemd.cuda

can represent positions to a precision of 1/1048576 A, a number which won't

change unless the cutoff gets needlessly large or unadvisably small.) The

error due to the coordinate representation is a sum of non-bonded forces

(the bonded interactions are done in double precision even in SPFP mode).

All results are in kcal/mol-A.

Overall error due to 32-bit coords : 0.0000095

Error in reciprocal space, default settings : 0.0032053

Error in direct space electrostatics, : 0.0031423

Error due to 9A LJ truncation : 0.0008458

There's been a motion to increase our default settings to dsum_tol 1.0e-6.

This reduces direct space error by a factor of roughly 10, but it increases

reciprocal space error unless something is done to improve the precision

there (i.e. more grid density, or a longer direct space cutoff, as higher

interpolation orders aren't in the cards for now). I think that the

coordinate representation is a good metric to bear in mind: 1.0e-5

kcal/mol-A... how many times greater are our other sources of error?

If we go down all the way to dsum_tol = 1.0e-6 and match that in the

reciprocal space part, we'll be matching the precision of the LJ part for a

~11A cutoff. I'd propose we urge people to use dsum_tol = 4.0e-6 with a

10A cutoff. This will give us the following errors (understand, the direct

space elec/LJ errors will be consistent across different simulations, and

the reciprocal space error in the JAC benchmark is close to a maximum

because the 62A cubic box produces grid spacings very near 1A, the maximum

that Amber's default grid sizer will accept).

Error due to dsumtol = 4.0e-6 : 0.0013631 kcal/mol-A

Error due to 10A LJ truncation : 0.0005210 kcal/mol-A

Error in reciprocal space : 0.0012215 kcal/mol-A

Another thing that we can do with an understanding of the coordinate

representation is tailor the representation of forces as we accumulate them

in the non-bonded direct space sum. It would be very helpful to those of

us programming pmemd.cuda if we could keep things in a 32-bit

representation until we're ready to commit. I'm building a version of

pmemd.cuda that accumulates up to 32 direct space forces (all results for

one thread in one tile) with 20 bits of precision after the decimal, then

recasts the 32-bit accumulators to 64-bit and scales the numbers by another

2^20 to put them in the same units as the other forces. It boosts

performance a few percent and saves a lot of __shared__ memory which can be

put to other uses. I've estimated the error due to truncating individual

forces at 20 bits after the decimal (0.000001 kcal/mol-A) and accumulating

~180 of them per atom (about what happens with a 10A cutoff): 7.2e-6

kcal/mol-A error, or about the same as the error due to the coordinate

representation. In this format I can handle forces up to 2000 kcal/mol-A.

I'd conclude this by saying "I don't think that any simulation that's

stable at all is going to break that," but it sounds too ominous and will

no doubt invite comments of "famous last words..." If really necessary, I'm

pretty confident that I can push to 19 bits of precision past the decimal

(force truncation is then getting us slightly more error than the

coordinate rounding, but still way way below any of the other sources), and

let the accumulators take up to 4000 kcal/mol-A forces. About one in a

million individual forces gets above 64 kcal/mol-A, and it becomes

exponentially less likely to get forces of larger and larger sizes the

higher up you go.

If anyone else has thoughts, please share.

Dave

_______________________________________________

AMBER-Developers mailing list

AMBER-Developers.ambermd.org

http://lists.ambermd.org/mailman/listinfo/amber-developers

Received on Tue May 16 2017 - 19:30:03 PDT

Custom Search