Date: Sat, 09 Dec 2006 14:03:13 +0000
From: Mark Williamson <.......>
MIME-Version: 1.0
To: amber-developers.scripps.edu
Subject: Formatting bug in AMBER9 sander:qm2_load_params_and_allocate()
Dear All,
Recently I ran a QMMM calculation using AMBER9 (up to patch level 24),
on a system (retrospectively: with an incorrect charge/spin state) and
obtained a generic cryptic style crash message :)
forrtl: severe (60): infinite format loop, unit 6, file
/some-user-path/stage5_qmmm.out
Image PC Routine Line Source
sander 0000000000A7F90A Unknown Unknown Unknown
sander 0000000000A7F1EA Unknown Unknown Unknown
sander 0000000000A4CBBC Unknown Unknown Unknown
sander 0000000000A0A7C1 Unknown Unknown Unknown
sander 0000000000A0AD21 Unknown Unknown Unknown
sander 0000000000A3AEAB Unknown Unknown Unknown
sander 0000000000A39C03 Unknown Unknown Unknown
sander 0000000000650AB9 Unknown Unknown Unknown
sander 00000000006002C4 Unknown Unknown Unknown
sander 00000000006DA98E Unknown Unknown Unknown
sander 00000000004E6BA9 Unknown Unknown Unknown
sander 00000000004AE7D4 Unknown Unknown Unknown
sander 00000000004AA75E Unknown Unknown Unknown
sander 000000000040029E Unknown Unknown Unknown
sander 0000000000A87DCE Unknown Unknown Unknown
sander 00000000004001AA Unknown Unknown Unknown
My first port of call was to recompile with ifort using debugging
symbols and passing it through a debugger. This did not yield many
useful results, perhaps due to my inability to use idb properly.
Anyway, I'm running Fedora Core 5 on my desktop and it has gfortran (gcc
version 4.1.1 20060525 (Red Hat 4.1.1-1) ) and I had a hunch at this
point that gfortran may probably provide a better description of the
problem and/or rule out it being a compiler specific problem. After
recompiling with gfortran (configure gfortran), a very useful error
message was returned (without -g as well):
At line 3541 of file _qm2_load_params_and_allocate.f
Fortran runtime error: Insufficient data descriptors in format after
reversion
Ok, so looking in this region in _qm2_load_params_and_allocate.f, it
became clear that the problem:
a) was in a region of code was not often was explored in "program phase
space"
b) actually due to a formatting issue when the number of electrons does
not match the QM's spin state:
...
write(6,'(''QMMM: System specified with odd number of electrons
''('',i5,''))') nelectrons
...
Ok, changing the thread a little in this story; I was able to prepare a
test case to reproduce the problem and from this, a possible patch:
Example Case
============
mkdir /tmp/testcase
cd /tmp/testcase
wget
http://www.rosswalker.co.uk/tutorials/amber_workshop/Tutorial_Six_amber9/files/min_qmmm.in
wget
http://www.rosswalker.co.uk/tutorials/amber_workshop/Tutorial_Six_amber9/files/NMA.prmtop
wget
http://www.rosswalker.co.uk/tutorials/amber_workshop/Tutorial_Six_amber9/files/NMA.inpcrd
**Modify qmcharge=0 to qmcharge=1 min_qmmm.in to cause a break**
$AMBERHOME/exe/sander -i min_qmmm.in -p NMA.prmtop -c NMA.inpcrd
Observe similar error message as outlined in the introduction to this email.
===========
I made changes that fixed the issue for me and generated a patch:
diff -urN amber-9.24 amber-9.25 > bugfix_25.diff
This yielded the following patch:
diff -urN amber-9.24/src/sander/qm2_load_params_and_allocate.f
amber-9.25/src/sander/qm2_load_params_and_allocate.f
--- amber-9.24/src/sander/qm2_load_params_and_allocate.f 2006-04-04
00:35:55.000000000 +0100
+++ amber-9.25/src/sander/qm2_load_params_and_allocate.f 2006-12-07
23:21:41.000000000 +0000
.. -150,8 +150,8 ..
! Make sure we have an even number of electrons
if((nelectrons/2)*2 /= nelectrons) THEN
if (qmmm_mpi%master.and.qmmm_struct%qm_mm_first_call) then
- write(6,'(''QMMM: System specified with odd number of
electrons ''('',i5,''))') nelectrons
- write(6,'(''QMMM: but odd spin ('',i3,''). Correct error
and re-run calculation.'')') qmmm_nml%spin
+ write(6,'(''QMMM: System specified with odd number of
electrons ''(i5)'' '')') nelectrons
+ write(6,'(''QMMM: but odd spin ''(i3)''. Correct error and
re-run calculation.'')') qmmm_nml%spin
end if
stop
end if
.. -159,8 +159,8 ..
! Make sure we have an odd number of electrons.`
if((nelectrons/2)*2 == nelectrons) then
if (qmmm_mpi%master.and.qmmm_struct%qm_mm_first_call) then
- write(6,'(''QMMM: System specified with odd number of
electrons ('',i5,'')'')') nelectrons
- write(6,'(''QMMM: but odd spin ('',i3,''). Correct error
and re-run calculation.'')') qmmm_nml%spin
+ write(6,'(''QMMM: System specified with even number of
electrons ''(i5)'' '')') nelectrons
+ write(6,'(''QMMM: but even spin ''(i3)''. Correct error
and re-run calculation.'')') qmmm_nml%spin
end if
stop
end if
(If the formatting is messed up by medium of email, try a raw copy at
http://dumb.ch.ic.ac.uk/~mjw99/tmp/bugfix_25.diff )
Please examine critically this patch, it may not be correct. I am not
competent in fortran and there is the possibility that this patch may
generate side effects on other platforms. There may also be other errors
of a similar nature in that region as well.
Ok, now for the next gear change in the email.
My "breakthrough" above, came when I compiled using gfortran. It would
be really great if gfortran could mature to a state whereby it can
produce a build of AMBER9 that passes all the test cases. I want to
present some arguments as to why I think this is a good idea and a
method to help get it there. The following may read as a discourse with
political undertones, but its birth is purely pragmatic. I'd also like
to state, I'm not an AMBER developer, just a user who has used it for a
while.
Over the past two years or so, I've seen a general pattern on the list
with users that are new to AMBER. A majority of them are using a PC
(i386) with some free distro of Linux. They are also generally from a
non-UNIX background. Since AMBER > 7 is written in Fortran 90, the first
point where many people come unstuck is when they need to obtain a F90
compiler such as ifort or pgf90. Installation of such compilers can be
problematic in itself, but problems can reach new highs if say there is
a bug in the compiler itself. Another issue I've seen is that a future
version of a 3rd party compiler introduces bugs, and even worse, the
availability of the older compiler that worked, becomes limited. These
situations are hard; during onlist debugging sessions of such issues,
the AMBER developer is essentially devoting her/his own time fixing a
3rd party's compiler/problem and not a problem within AMBER.
I accept all the arguments that such GNU compilers are never as fast as
commercial compilers; that's not what I'm getting at, what I'm getting
at is increasing the efficiency that one can track down bugs with a
given error message. IMHO GNU compilers are becoming more widespread,
better tested and generally give a higher quality error message when
there is an issue. If better error messages are generated, then the
amount of time spent chasing bugs decreases. Since these messages are
distinctive eg "At line 3541 of file _qm2_load_params_and_allocate.f"
compared with ifort's stream of hex, when these are posted to the list,
google will pick them up and when a future "victim" of such a bug will
have an easier time getting a solution by googling on such specific
error message.
In a perverse way, a compiler, that say produces slower code, but has
better checking of its input can be used as an "pre-parser" by a faster,
less careful, compiler. I hope what I've shown in the first part of this
email is an example of this, this bug caused both ifort and gfortran
compiles to crash; gfortran gave more information to generate a fix and
both builds benefited from this.
As time progresses, gfortran will mature and will be shipped with every
distro under the sun, much like gcc has. Package managers that come with
recent Linux distros are also getting more and more sophisticated;
dependency issues are becoming a thing of the past with such utils like
apt and yum. A standard user will be able to install everything he/she
needs with say one command: "yum install gcc-gfortran". The package
manager will do the rest of the work.
Given the combination of increasing ubiquitousness of gfortran within
easier to use, package managed, distributions, having AMBER compile "out
of the box" with gfortran will save much time with future new AMBER
users. And, if there are issues, the error messages should (hopefully)
be more useful. If people want faster code / specific setups, then they
can investigate 3rd party compilers, but that's another level.
Generally, if people are reaching for such levels then the likelihood of
them possessing the skills to get there, is high.
So, what steps can be taken for gfortran to mature to a level to be
useful to AMBER? Well, I think it's actually very nearly there, but it
still has a few bugs/ missing features. One possible route to flush
these out is to use AMBER9 to debug the current version of gfortran:
1) Examine all the current AMBER9 gfortran test case failures.
2) Obviously, one cannot go and post AMBER testcases (and code) to the
gcc people. Hence one will need to make simple generic testcases of
these failures that can be run independent of AMBER code.
3) Post these independent testcases to gcc's bugzilla
4) Hope that someone in the gcc team gets onto it :)
5) If a fix is produced, try it and then provide feedback
From my own experiments and a previous message, looking at a specific
AMBER9 gfortran related bug:
http://dev-archive.ambermd.org/all/0044.html
one key failure is in the noesy test. I had a *very quick* play with
this a while ago, but did not get very far:
Debugging with gdb:
sander compiled with -g option
pwd
/opt/amber-9.17/test/noesy
gdb ../../exe/sander
break __relax_mat__noeread
run -O -i mdin -c 01.ann10.xyz -o noesy.out
set var i=5
set var j=99
(gdb) n
Single stepping until exit from function _gfortran_st_set_nml_var_dim,
which has no line number information.
__relax_mat__noeread (xx=0xb75cb008, ix=0xb7507008, ih=0x9f54198, _ih=4) at
_relax_mat.f:3722
3722 450 write(6,*) 'Namelist reports error in reading noeexp'
It seemed this problem was within gfortran itself and I could go no
further. Relating to this, I recall on another post Prof. Case
mentioning that it was something to do with the reading in of 2D arrays.
As I've said before, I'm no fortran guru and what I would really like is
a generic test case of this so that I can go the gcc/gfortran bugzilla.
Best regards,
Mark Williamson
http://dumb.ch.ic.ac.uk/~mjw99/
Received on Wed Dec 13 2006 - 05:22:02 PST