PROFASI  Version 1.5
Regularization: approximating a protein structure


Regularization is the process of identifying the best approximation of a given protein structure which, (i) satisfies the constraints of the protein model, such as the bond length and bond angle values imposed by PROFASI (ii) is a minimum of the interaction potential.


Although the protein model in PROFASI is an all atom model, it works under the approximation that the bond lengths and bond angles do not change. Further, these geometrical properties are parameters of the model. The parameters of the energy function are derived assuming certain values for the bond lengths and bond angles. For instance, the $N-C_{\alpha}$ bond for any residue in PROFASI is 1.46 Å. This an approximation derived from a statistical analysis of structures in the PDB.

A typical protein structure downloaded from the PDB will in general not have a $N-C_{\alpha}$ bond of exactly 1.46 Å, but something close to it. Using the atom coordinates as given in the PDB file will give incorrect energies in PROFASI, because the values of the bond lengths and angles are implicit in the derivation of parameters of our energy function. We therefore need to find alternative coordinates for the atoms, as close as possible to the ones in the PDB structure, which satisfy the bond length and bond angle constraints. This can be done by minimizing the all-atom RMSD with the given protein structure with respect to the degrees of freedom in PROFASI.

But the structure obtained from the RMSD minimization may be of limited value. The structure files in the protein data bank are normally refined using some force fields. The precise location of the atoms which leads to the lowest energy in the refining force field will not in general lead to a reasonably low energy in PROFASI's force field. If simulations are to be started using an approximated structure, or the energy calculated is to be compared to energies obtained in simulation, it is necessary to find a structure close to the given structure which is a minimum of PROFASI's force field. A direct minimization of energy after reading in the coordinates from a PDB file generally leads to rapid breaking of a lot of secondary structure. This is because using torsion angles calculated from the Cartessian coordinates in the PDB files to initialize a protein structure often leads to a clashes between atoms somewhere in the structure. Clashes have such high energies in PROFASI that it is then seen as a reduction in energy if we move the clashing atoms apart even if we break secondary structure in the process. For this reason, the energy minimization is done in many stagees with a gradually decreasing RMSD constraint. The program regularize performs this using a mixture of Monte Carlo and conjugate gradient minimization.

How to regularize structures in PROFASI

Given a PDB file abc.pdb, to find a regularized structure, do the following:

regularize abc.pdb

This will try to find a regularized structure using default values for a set of options. To see the various supported options controlling the behaviour of the program, see the documentation of regularizer .

Two output files are generated. min_rmsd.xml is the best approximation found by minimizing RMSD alone. min_etot.xml corresponds to a minimum of PROFASI's energy function near the given structure. The output files are in PROFASI's XML structure file format. They can be converted to the PDB format like this:

prf_convert min_etot.xml min_etot.pdb


Both the processes of minimizing RMSD and energy might fail to produce good approximations for different reasons. When RMSD minimization fails to get a satisfactorily low value, it is most often because in those cases atoms are labeled differently in PROFASI and in the PDB file. Labeling errors in PDB files are unfortunately not infrequent. This results in the program trying to bring the wrong atoms close to each other. Less frequently, there are structures in which an amino acid at a given position might exhibit bond lengths or bond angles which are relatively far from the approximations (based on average values over PDB) used in PROFASI. The minimization process then finds a compromise which may not look like a good approximation.

Since the "landscape" of the interaction potential as a function of the degrees of freedom is much rougher than the lanscape of RMSD, it is much more likely that the energy minimization process gets trapped in an uninteresting local minimum of high value. The minimum of energy closest to the RMSD minimized structure is not necessarily interesting, as it may still contain clashes. The lowest possible energy within 2 Å of the RMSD minimum may be the 23427th minimum ranked by RMSD. The "best" structure is therefore a nebulous concept. The program regularize therefore has a more modest goal of producing "a" minimum of energy, rather than "the" minimum. Quite often the minimum produced is also of low RMSD and energy. Since it uses Monte Carlo as part of its minimization algorithm, one should run it a few times and select one of the output structures. If the protein folds with PROFASI's force field, the full length folding simulation will almost certainly find structures with lower energy than what regularize finds.

We should also mention that sometimes, regularization does not preserve all parts of the given structure. It may be that the structure under consideration is not even a local minimum in the energy function of PROFASI, so that the energy minimization process will move away from it. One has to remember that the energy function is a work in progress.

See Also

PROFASI: Protein Folding and Aggregation Simulator, Version 1.5
© (2005-2016) Anders Irbäck and Sandipan Mohanty
Documentation generated on Mon Jul 18 2016 using Doxygen version 1.8.2