From: Pengzhi <zhangpengzhi1988.gmail.com>
Date: Mon, 2 Apr 2012 21:59:58 -0500
On Mon, Apr 02, 2012, Pengzhi wrote:
>
> I have a system with PBC. The dimension of the cubic box I use is 900? *
> 900? * 900?. When I set the cut-off to be 450? or slightly less than that,
> my system ends up with explosion. I have the error message:
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>
> I can avoid this by setting a smaller cutoff (anything smaller than 360?
> works). However, if I want to keep as much interactions as possible,
> technically what is up limit of the cut-off in amber10 given size of the
> box?
I've run into the same problem with a normal system.
110 tip3p waters in a 25 A^3 box.
Serial sander compiled with gnu 4.6.3 on fedora 15.
All serial tests pass.
The following runs without a segfault:
title
&cntrl
imin=0
nstlim=10000 ! 1.ns
ntt=0 ! 0=constE 3=langevin constT
tempi=300. ! initial temp
temp0=300 ! target temp
! shake
ntc=2 ! =2 water shake
ntf=2 ! =2 don't eval water H-O bonds
! periodic boundaries
ntb=1 ! vacuum=0 constant volume=1 constant pressure=2
! output
cut=8.
! iwrap=0
ntpr=1 ! mdout
ntwx=1 ! mdcrd
/
&ewald
nfft1=30
nfft2=30
nfft3=30
order=6
/
HOWEVER, if I change cut=8 to cut=9, then I get a segfault.
I recompiled with -g and ran it through valgrind.
The problem is in src/sander/nonbond_list.F90
subroutine get_nb_list
...around line ~860
[snip]
do j = nstart,nstop
jtran = tranyz+xtran(j-nstart+1,xtindex)
jj = index1+j-xtran(j-nstart+1,xtindex)*nucgrd1
m1 = nlogrid(jj)
m2 = nhigrid(jj)
if ( m2 >= m1 )then
do m = m1,m2
numlist = numlist+1
!!!! HERE
atmlist(numlist) = m
!!!! AND HERE
itran(numlist)=jtran
end do
end if
end do
[/snip]
atmlist and itran are length natom, but numlist grows beyond this,
causing bad memory writes and a segfault.
The problem appears to be the value of jj.
I created a boolean mask that tests whether or not a jj is iterated over more than once. It is.
It is unclear to me if the problem is with the formula used to compute jj or if the problem is the nstart,nstop pair.
I also note that the 8 A cut is less than 1/3 the size of the box
and the 9 A cut if more than 1/3 the size of the box.
Both values are less than 1/2 the size of the box.
9 A + 2 A (skinnb) is still less than 1/2 the size of the box.
The reason why this catches my attention is that, from what I can guess, the code is looping over the faces of a 3x3 grid (?)
I can get sander to run without segfaults if I change
"""
if ( numimg(index) > 0 )then
ncell_lo = nlogrid(index)
ncell_hi = nhigrid(index)
numlist = 0
"""
to read
"""
if ( numimg(index) > 0 )then
ncell_lo = nlogrid(index)
ncell_hi = nhigrid(index)
numlist = 0
seen = .FALSE. !! HERE
"""
where seen is a SIZE(nlogrid) allocatable LOGICAL array,
and I change
"""
do j = nstart,nstop
jtran = tranyz+xtran(j-nstart+1,xtindex)
jj = index1+j-xtran(j-nstart+1,xtindex)*nucgrd1
"""
to read
"""
do j = nstart,nstop
jtran = tranyz+xtran(j-nstart+1,xtindex)
jj = index1+j-xtran(j-nstart+1,xtindex)*nucgrd1
IF ( seen(jj) .AND. m2 >= m1 ) THEN ! HERE
CYCLE ! HERE
ELSE ! HERE
seen(jj) = .TRUE. ! HERE
END IF ! HERE
"""
Although it runs, I can't say with certainty that it is correct.
I seriously doubt that the above "fix" would be correct with sander.MPI
because each process would "see" things differently.
Nor can I say whether or not a similar issue is present elsewhere in the code.
...or maybe I'm running sander wrong - certainly possible because I've never really used it before.
-Tim
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Dec 02 2013 - 18:30:03 PST