Hi Kim,
The stack size was indeed the problem. Default stack size in my setup was
10M:
[droe.case1 sander]$ ulimit -a
...
stack size (kbytes, -s) 10240
...
Doubling the stack size to 20M allowed the tests to complete successfully.
So we could potentially add a script called checkStackSize that any tests
that require a large stack could call. I am attaching one that uses the
"ulimit" command (note: not unlimit - passing a value of unlimited to the
stack on my system isn't allowed) to check the stack size and increase it if
it is below 20480 kbytes; 1 is returned if this could not be done, 0 if the
stack is ok. It can be called from the necessary EVB test run scipts like
this:
set stack=`../../checkStackSize.sh`
if ($stack == 1) then
echo "This test requires a larger stack."
exit(0)
endif
It works just fine on my linux machine and cygwin rig (where it fails
because cygwin's stack limit is hard coded), but I'm not sure how portable
it is - anyone want to test it out?
Dave, Kim's patch seems to work fine. However I think it will be compatible
with the fharm() stuff I changed in evb_umb.f, so that part of my patch
should be rolled back (I can come up with a patch to do this if you want).
It looks like Kim changed the output of the test cases, so I guess the
output of the test cases before wasn't correct.
-Dan
On Fri, Apr 2, 2010 at 8:31 AM, Kim F. Wong <kimberlyyellow.gmail.com>wrote:
> Dan,
>
> It may have to do with the default stacksize. On my laptop, I was seeing
> this problem and it goes away if I do "unlimit" before these tests. These
> tests read in ~3X more ab initio data than the other DG-EVB tests. Perhaps
> we can place a "unlimit" within the Run.evb in each of these tests. What do
> you suggest (both for the short-term & long-term)?
>
> -Kim
>
>
> On 4/2/2010 8:22 AM, Daniel Roe wrote:
>
>> The EVB patch seems to work well, but I am still having problems with a
>> few
>> of the tests:
>>
>> cd evb/poh_dbonds_umb_dg_UFF_9DG&& ./Run.evb
>> cd evb/poh_dbonds_umb_dg_UFF_9DG_pimd_ld_full&& ./Run.evb
>> cd evb/poh_dbonds_umb_dg_UFF_9DG_pimd_nhc_full&& ./Run.evb
>> cd evb/poh_dbonds_umb_dg_UFF_9DG_nmpimd_full&& ./Run.evb
>> cd evb/poh_dbonds_umb_dg_UFF_9DG_nmpimd_full_TST-freqf&& ./Run.evb
>>
>> Previously however I was having issues with these tests that Mark was not
>> seeing. Does anybody else have these tests fail with an MPI_abort? I've
>> had
>> it happen to me with both gnu and intel compilers (2 versions, 10 and 11)
>> as
>> well as 2 different MPICH2 versions.
>>
>> -Dan
>>
>> On Thu, Apr 1, 2010 at 10:04 PM, Kim F. Wong<kimberlyyellow.gmail.com
>> >wrote:
>>
>>
>>
>>> Dan,
>>>
>>> Thanks for your help. I made a patch (see attached) earlier today& was
>>> running the tests. Although I've verified that the patch works, I would
>>> appreciate it if you can test it at your end before committing to the RC.
>>>
>>> -Kim
>>>
>>>
>>> On 4/1/2010 6:18 PM, Daniel Roe wrote:
>>>
>>>
>>>
>>>> Hi All,
>>>>
>>>> This is regarding the previously discussed EVB test cases that segfault:
>>>>
>>>> cd evb/malon_dbonds_umb_dg_UFF_3DG_qi_full_2D-PMF&& ./Run.evb
>>>> cd evb/malon_dbonds_umb_dg_UFF_3DG_qi_full_corrF&& ./Run.evb
>>>>
>>>> I have made some modifications to the code that constitute a partial
>>>> fix,
>>>> but I can't proceed further without input from EVB people.
>>>>
>>>> These tests can both be protected from segfaults by making the loop at
>>>> line
>>>> 148 in pimd_force.f that references dmdlm dependent on the value of
>>>> itimass
>>>> (which is what triggers the init of dmdlm), e.g.
>>>>
>>>> pimd_force.f
>>>> 148c148
>>>> < if( i_qi> 0 ) then
>>>> ---
>>>>
>>>>
>>>>
>>>>
>>>>> if( i_qi> 0 .and. itimass> 0) then
>>>>>
>>>>>
>>>>>
>>>>>
>>>> At this point the tests will run but the output energies don't match at
>>>> all.
>>>> I was able to find a version of amber10 (from June 2008) that passed
>>>> both
>>>> of
>>>> these test cases.
>>>>
>>>> I was able to recover the test results for the 2D-PMF test by modifying
>>>> evb_umb.f, setting the array fharm(:) to zero outside of loops it is
>>>> involved in (the way it was done previously) instead of inside (the way
>>>> it
>>>> is currently done).
>>>>
>>>> evb_umb.f
>>>> 102c102
>>>> < ! fharm(:) = 0.0d0
>>>> ---
>>>>
>>>>
>>>>
>>>>
>>>>> fharm(:) = 0.0d0
>>>>>
>>>>>
>>>>>
>>>>>
>>>> 106c106
>>>> < fharm(:) = 0.0d0
>>>> ---
>>>>
>>>>
>>>>
>>>>
>>>>> ! fharm(:) = 0.0d0
>>>>>
>>>>>
>>>>>
>>>>>
>>>> 191c191
>>>> < ! fharm(:) = 0.0d0
>>>> ---
>>>>
>>>>
>>>>
>>>>
>>>>> fharm(:) = 0.0d0
>>>>>
>>>>>
>>>>>
>>>>>
>>>> 195c195
>>>> < fharm(:) = 0.0d0
>>>> ---
>>>>
>>>>
>>>>
>>>>
>>>>> ! fharm(:) = 0.0d0
>>>>>
>>>>>
>>>>>
>>>>>
>>>> You can even see that the fharm(:) statements were only commented out
>>>> and
>>>> not removed - does anyone familiar with the code know why it was
>>>> changed?
>>>> According to comments in the code it seems to have been changed around
>>>> Dec.
>>>> 2008. Anyway, when I reverse these changes the 2D-PMF test results match
>>>> (aside from a few diffs that are output format-related). Of course, the
>>>> test
>>>> case itself could be wrong, but I have no easy way of knowing that.
>>>>
>>>> However, the corrF test still fails by a mile - as far as I can tell the
>>>> likely culprit is with the qi_corrf_les() subroutine in pimd_force.f -
>>>> much
>>>> of it was changed around March 2009. These changes are far more
>>>> extensive
>>>> (>
>>>> 100 lines at least) so I don't feel comfortable rolling them back.
>>>>
>>>> Anyway, I am attaching a patch that makes the changes that I discussed.
>>>> If
>>>> nothing else it prevents the ugly segfaults.
>>>>
>>>> Someone more familiar with what EVB *should* be doing should definitely
>>>> have
>>>> a close look at all of these changes.
>>>>
>>>> -Dan
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER-Developers mailing list
>>>> AMBER-Developers.ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> AMBER-Developers mailing list
>>> AMBER-Developers.ambermd.org
>>> http://lists.ambermd.org/mailman/listinfo/amber-developers
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> _______________________________________________
> AMBER-Developers mailing list
> AMBER-Developers.ambermd.org
> http://lists.ambermd.org/mailman/listinfo/amber-developers
>
--
-------------------------
Daniel R. Roe
Postdoctoral Associate
SAS - Chemistry & Chemical Biology
610 Taylor Road
Piscataway, NJ 08854
_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Fri Apr 02 2010 - 06:30:03 PDT