PROFASI  Version 1.5
Generating native contact lists

The program generate_contact_file can be used to generate a list of contacts found in a structure. Such a file can be used to initialize a prf::ContactMap observable, to monitor the on/off state of native contacts through a run.

Types of contact maps in PROFASI

For the purpose of contact map measurements, "contact" in PROFASI is between two residues. Therefore it can be enumerated by two integers corresponding to their unique serial numbers in the population. The following types of contacts are available by default:

Typical usage

Given a PDB file abc.pdb, you can find the hydrogen bond contacts in it like this:

  generate_contact_file -ct HBContact -o result.dat abc.pdb

If the PDB file contains the structure of a large protein and you are interested in the internal contacts of a segment between residues 50–100, you can do this:

  generate_contact_file -ct HBContact -o result.dat abc.pdb -sl :A,50,100

In the above, we gave the selections differently compared to PROFASI programs like mimiqa. To select residues 50 to 100 in chain A of file abc.pdb in mimiqa, we would write abc.pdb:1:A,50,100. The syntax for the selections for generate_contact_file is in fact the same. But one has to specify the selection string separately with the -sl option. Think of the mimiqa filename-selection combination as filename:selection_string. For generate_contact_file, we need the selection_string part. That would have been 1:A,50,100 above, but if we are happy to use the first model in the PDB file, we can omit the model identifier 1 here. This explains the string :A,50,100 above.

The reason for the separate option -sl for selections is that it can be applied to a bunch of PDB files together. The program then looks for preserved contacts in all those files. Something like this:

  generate_contact_file -ct HBContact -o result.dat abc_*.pdb -sl :A,50,100
The directionality of the hydrogen bond potential in PROFASI means that the HBContact measure may miss many hydrogen bonds in a PDB file, typically refined with some other force field. To find all hydrogen bonds, it is better to first regularize the structure (See Regularization: approximating a protein structure), and take the resulting energy minimized structure file as input. The XML output files of the regularizer may be converted to a PDB file with prf_convert if you want to use selections like in the examples above.

Example with output

Let's take a concrete example: the C-terminal hairpin of protein G, and generate the native hydrogen bond contacts. This hairpin has been studied as an excised peptide in experiments and numerous simulations. Download 1GB1.pdb from the PDB. The hairpin consists of residues 41–56 of the only chain in this structure.

First, let's cut out the relevant residues into a new PDB file.

  pdb_slices 1GB1.pdb::A,41,56 -o hairpin.pdb

Check the contents of the file hairpin.pdb. It should only have the atom records for residues 41 through 56. Next, let's regularize this file:

  regularize hairpin.pdb

There should be two output files, "min_etot.xml" and "min_rmsd.xml". We need the first, to get the native hydrogen bonds.

  generate_contact_file -ct HBContact min_etot.xml -o hairpin.nhb

This will generate an output file hairpin.nhb with the following contents:

0  1  14
1  3  12
2  5  10
3  8  5
4  9  5
5  10  5
6  12  3
7  14  1

The first line shows the sequence of the peptide segment. The residue indexes in the bond listing must be interpreted relative to this sequence, and they start from 0. So, the hydrogen bonds identified were 2 each between residue pairs 1(E)-14(T), 3(T)-12(T), 5(D)-10(T) and another between the NH dipole of 8(T) and the CO dipole of 5(D). PROFASI's regularizer uses a little bit of MC. So, you could run the regularizer a few times and choose the lowest energy regularized state found.

Options supported by generate_contact_file

Some of these options have been described above in the examples, and the others have fairly obvious meanings. But we would like to point out that the option –aa_range or -r is not the same as the option –selection . When you use -sl, you indicate that you want to work with the selected part of the protein in the simulation. When you use -r you indicate that you want a measurement of a limited part of all native contacts of the simulated system.

For example, let's consider the situation when we want to simulate the whole protein in 1GB1.pdb. We want to monitor only the hydrogen bonds corresponding to the helix (23 (ALA) to 36 (ASP)). The required contacts file will then be generated by

    generate_contact_file -ct HBContact 1GB1_reg.pdb -o 1GB1_hel.nhb -r 22 35

In the above, 1GB1_reg.pdb is obtained by first regularizing 1GB1.pdb and then converting the resulting min_etot.xml to a PDB file. The file 1GB1_hel.nhb will contain only a subset of all native contacts in 1GB1.pdb, but will number them relative to the sequence of whole chain. If you used the -sl option, the same contacts will be generated, but they will be numbered relative to the segment of the helix section alone. If that's what you want to simulate, fine. But in the present example, those indexes will correspond to the wrong residues of the chain.

The residue ranges in the -r option start from 0, while the -sl option makes a selection on the PDB file with numbering as in that file.
See Also
prf::ContactMap, Regularization: approximating a protein structure, Converting between structure file formats, Extracting parts of a PDB file

PROFASI: Protein Folding and Aggregation Simulator, Version 1.5
© (2005-2016) Anders Irbäck and Sandipan Mohanty
Documentation generated on Mon Jul 18 2016 using Doxygen version 1.8.2