[AMBER-Developers] Comments from Romain Wolf about tleap from David A Case on 2011-11-03 (Amber Developers Archive Nov 2011)

From: David A Case <case.biomaps.rutgers.edu>
Date: Thu, 3 Nov 2011 10:38:47 -0400

I'm forwarding some comments from Romain Wolf relevant to tleap;
I've interpolated some ideas of my own.

As an aside, I hope more of you will look at amberlite (in
$AMBERHOME/AmberTools/amberlite), and in particular at its excellent
documentation. Among other things, this would serve as a great "advanced
tutorial", but we've not been so good yet about advertising that to users.

----- Forwarded message from "Romain M. Wolf" <romain.wolf.gmail.com> -----

General remark to PDB files and how they are treated in AMBER routines:
--------------------------------------------------------------------------
Most PDB files are not perfect. While tleap does a good job on 'perfect'
PDB files, many things can go wrong for PDB files with missing residues,
gaps etc. One major problem I still fight with regularly (and possibly
wrongly described also in my documentation for amberlite) is the actual
residue numbering. Whenever you define S-S links (not relying on any
'automatic' procedure which may work or often not...), you must specify
the CG atoms that get linked. The correct residue number is what? The
number in the PDB file or the actual sequential residue number? I always
get confused. Simple example: you have a file starting with residue 1, no
gaps. Now you cap the N-terminal with an ACE. Hence all residue numbers
get incremented by one. The original CYX residues now have also different
numbers. If they are CYX, they are taken from the library as such and end
with an S, not SH. Calculations later proceed fine, except that the free
S atoms are dangling around. So I always check carefully the tleap output
to see if the bonds were made. Anyone should do this, of course, but many
probably don't.

Adding to the general discussion about tleap and sleap, see below:

Things that should be in tleap:
--------------------------------------
* use as reference the actual PDB residue numbers, independent if they are
sequential or not, i.e., make the residue numbers 'names', at least for
commands referring to resnumbers;

* if sequential numbering is required for correct functioning
later, generate automatically a correspondence table AmberResNumber
vs. OriginalPDBNumber; people working on many project with dozens of
crappy PDB files would greatly appreciate such a feature;

[DAC note: Joe Krahn has written a program, add_pdb (source in
$AMBERHOME/AmberTools/src/xray, binary in $AMBERHOME/bin) which puts
all the original PDB residue id information in the prmtop file. One
essentially then has the correspondence table that Romain is talking
about, and programs that read prmtop files could make use of either the
final amber sequential numbering, or the original PDB identification (with
gaps, insertions, etc.).

Of course, few people know that this exists, or write their programs to
look for such data in the prmtop file. And it is not built into tleap.
But if such information became a standard part of building prmtops, then
analysis programs could start making use of it. We might want to improve
on the details of what Joe did, or its implementation, but the basic idea
seems sound.]

* other possibility for bonds made by leap:
  check that the requested bond is actually made and FAIL to make the
  parmtop file if not; (note that in my pytleap, the user does not get
  on-screen feedback from tleap if not looking closely at the log file,
  which some people do, others not...); this may be a special problem
  for people using tleap in scripts, I agree, but it should not be too
  difficult to implement this directly in tleap, and not rely on scripts
  to fish out such warnings and act accordingly; I have more than once
  run an MD on a protein with S-S links and found out too late that the
  disulfides were not made. My fault, I agree, but if this can be avoided,
  it would be nice;

* capping N- and C-terminals: maybe this is already implemented in tleap, but I have not
  seen it: allow for automatic capping of open N- and C-terminals
  (especially for gaps also); Ken has a nice routine (in MTK) which allows
  to cut out residues around a binding site and cap all open peptide
  bonds; something automatic like this in tleap would be a great help.

VERY IMPORTANT: a chapter on how to prepare structures BEFORE they get
into tleap. I have tried in my amberlite documentation to give some
advice to newcomers. Not perfect at this point (and maybe riddled with
some errors also). My main advice for now is: Make a very 'clean' PDB
file with all residues (WITHOUT hydrogens), numbered sequentially, put TER
records where needed (e.g., in gaps to avoid 10 Angstroem peptide bonds),
include the ACE and NME residues where needed (and make them part of the
numbering scheme), then decide on special residue names (CYX, HIP, HIE,
HID).

I am willing to help in generating a PDB-to-tleap documentation, depending
on how much time has to go into it.

---regards---romain

_______________________________________________
AMBER-Developers mailing list
AMBER-Developers.ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber-developers
Received on Thu Nov 03 2011 - 08:00:03 PDT