Re: [AMBER-Developers] [AMBER] Setting the cut-off and size of box

From: Daniel Roe <daniel.r.roe.gmail.com>
Date: Mon, 2 Dec 2013 22:59:56 -0700

Hi,

I was able to reproduce the segfault using code compiled from a recent GIT
pull, using the same system (110 TIP3P waters) but different input. The
error occurs in the same place (nonbond_list.F90:860) but I don't think the
problem is in the calculation of jj, at least not in my case (otherwise
valgrind would complain about nlogrid and nhigrid). The issue appears to be
that atmlist is allocated for total number of atoms (natom), but in some
cases the indexing variable numlist becomes greater than natom (note that
an error is triggered for array itran also, which fits).

This seems like a case where the cutoff is just set too large for the
system size (which matches the behavior you saw). For example, I was
running my system with a cutoff of 6.0 when it triggered the segfault; when
I lowered the cutoff to 5.7 I didn't get a segfault. Sander tries to do
some checking to prevent problems like this, but there seems to be a grey
area for certain systems (like a small box of water) where certain cutoffs
result in grid parameters that are slightly off, the end result being a
segfault. Note that pmemd typically does a more thorough job of checking
for problems (and the code is in general more efficient), and indeed if I
run a system with pmemd that segfaulted with sander it either finishes
nicely or complains about the parameters and exits gracefully.

All this being said, no 'legal' combo of input/topology/coordinates should
generate a segfault. I think the long-term solution may be adding some
extra checks for unit cell params + cutoff size to sander along the lines
of what is in pmemd (although this may be more difficult than it sounds
since pmemd does PME differently than sander). In the meantime it's
probably best to use pmemd if possible, or to use a smaller cutoff until
your system stabilizes (in NTP ensemble) before switching to NTV/NVE.

-Dan

PS - Is there a specific reason you are explicitly setting ewald
parameters? In general the code tries to pick optimal values. Using
NFFTX=30 for a small system and/or order=6 will certainly slow your
calculations down for I think not much benefit.


On Mon, Dec 2, 2013 at 7:16 PM, Timothy J Giese <timothyjgiese.gmail.com>wrote:

> From: Pengzhi <zhangpengzhi1988.gmail.com>
> Date: Mon, 2 Apr 2012 21:59:58 -0500
>
> On Mon, Apr 02, 2012, Pengzhi wrote:
> >
> > I have a system with PBC. The dimension of the cubic box I use is 900? *
> > 900? * 900?. When I set the cut-off to be 450? or slightly less than
> that,
> > my system ends up with explosion. I have the error message:
> >
> > forrtl: severe (174): SIGSEGV, segmentation fault occurred
> >
> > I can avoid this by setting a smaller cutoff (anything smaller than 360?
> > works). However, if I want to keep as much interactions as possible,
> > technically what is up limit of the cut-off in amber10 given size of the
> > box?
>
>
> I've run into the same problem with a normal system.
> 110 tip3p waters in a 25 A^3 box.
> Serial sander compiled with gnu 4.6.3 on fedora 15.
> All serial tests pass.
>
> The following runs without a segfault:
>
> title
> &cntrl
> imin=0
> nstlim=10000 ! 1.ns
> ntt=0 ! 0=constE 3=langevin constT
> tempi=300. ! initial temp
> temp0=300 ! target temp
> ! shake
> ntc=2 ! =2 water shake
> ntf=2 ! =2 don't eval water H-O bonds
> ! periodic boundaries
> ntb=1 ! vacuum=0 constant volume=1 constant pressure=2
> ! output
> cut=8.
> ! iwrap=0
> ntpr=1 ! mdout
> ntwx=1 ! mdcrd
> /
> &ewald
> nfft1=30
> nfft2=30
> nfft3=30
> order=6
> /
>
> HOWEVER, if I change cut=8 to cut=9, then I get a segfault.
> I recompiled with -g and ran it through valgrind.
> The problem is in src/sander/nonbond_list.F90
> subroutine get_nb_list
> ...around line ~860
>
> [snip]
> do j = nstart,nstop
> jtran = tranyz+xtran(j-nstart+1,xtindex)
> jj = index1+j-xtran(j-nstart+1,xtindex)*nucgrd1
> m1 = nlogrid(jj)
> m2 = nhigrid(jj)
> if ( m2 >= m1 )then
> do m = m1,m2
> numlist = numlist+1
> !!!! HERE
> atmlist(numlist) = m
> !!!! AND HERE
> itran(numlist)=jtran
> end do
> end if
> end do
> [/snip]
>
> atmlist and itran are length natom, but numlist grows beyond this,
> causing bad memory writes and a segfault.
>
> The problem appears to be the value of jj.
> I created a boolean mask that tests whether or not a jj is iterated over
> more than once. It is.
> It is unclear to me if the problem is with the formula used to compute jj
> or if the problem is the nstart,nstop pair.
>
> I also note that the 8 A cut is less than 1/3 the size of the box
> and the 9 A cut if more than 1/3 the size of the box.
> Both values are less than 1/2 the size of the box.
> 9 A + 2 A (skinnb) is still less than 1/2 the size of the box.
> The reason why this catches my attention is that, from what I can guess,
> the code is looping over the faces of a 3x3 grid (?)
>
> I can get sander to run without segfaults if I change
> """
> if ( numimg(index) > 0 )then
> ncell_lo = nlogrid(index)
> ncell_hi = nhigrid(index)
> numlist = 0
> """
> to read
> """
> if ( numimg(index) > 0 )then
> ncell_lo = nlogrid(index)
> ncell_hi = nhigrid(index)
> numlist = 0
> seen = .FALSE. !! HERE
> """
> where seen is a SIZE(nlogrid) allocatable LOGICAL array,
> and I change
> """
> do j = nstart,nstop
> jtran = tranyz+xtran(j-nstart+1,xtindex)
> jj = index1+j-xtran(j-nstart+1,xtindex)*nucgrd1
> """
> to read
> """
> do j = nstart,nstop
> jtran = tranyz+xtran(j-nstart+1,xtindex)
> jj = index1+j-xtran(j-nstart+1,xtindex)*nucgrd1
> IF ( seen(jj) .AND. m2 >= m1 ) THEN ! HERE
> CYCLE ! HERE
> ELSE ! HERE
> seen(jj) = .TRUE. ! HERE
> END IF ! HERE
> """
>
> Although it runs, I can't say with certainty that it is correct.
> I seriously doubt that the above "fix" would be correct with sander.MPI
> because each process would "see" things differently.
> Nor can I say whether or not a similar issue is present elsewhere in the
> code.
> ...or maybe I'm running sander wrong - certainly possible because I've
> never really used it before.
>
> -Tim
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>



-- 
-------------------------
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 201
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Mon Dec 02 2013 - 22:30:03 PST
Custom Search