PROFASI
Version 1.5
|
Representation for a poly peptide chain. More...
#include <Protein.hh>
Public Member Functions | |
Protein () | |
Default constructor. | |
Protein (std::string aaseq) | |
Constructor taking an amino acid sequence. | |
Protein (const Protein &) | |
Copy constructor. | |
Protein & | operator= (const Protein &) |
Assignment operator. | |
void | translate (const Vector3 &trv) |
Translate molecule. | |
void | rotate (double angl, const Vector3 &axs) |
Rotate molecule about an axis passing through the center of mass. | |
void | reconstruct () |
Recalculate cartessian coordinates from internal. | |
void | reconstruct (int idir, int iaa) |
Reconstruct given a starting point and a direction. | |
void | EnforceBC () |
Bring all coordinates to their appropriate periodic intervals. | |
Initialisation related functions | |
void | clear () |
Clears all arrays and resets scalars to defaults. | |
int | add_aminoacid (prf::OneLetterCode cod) |
Increment amino acid sequence with one residue at the C-terminal. | |
int | add_NTLigand (prf::OneLetterCode cod) |
Add N-terminal capping group. | |
int | add_CTLigand (prf::OneLetterCode cod) |
Add C-terminal capping group. | |
int | set_sequence (const std::vector< prf::OneLetterCode > &gseq) |
Set a new sequence and allocate memory accordingly. | |
void | connect_residues () |
void | init_atoms () |
void | create_backbone () |
void | init_dof () |
Initialize DOF arrays. | |
void | trivial_init () |
All internal coordinates to zero. | |
void | stretched_init () |
Initialize to stretched out state. | |
void | random_init (RandomNumberBase *rangen) |
all internal coordinates to random values | |
void | randomize (RandomNumberBase *rangen) |
void | helical_init () |
ideal alpha helix for backbone, zero for rotamers | |
int | metamorphose (prf_xml::XML_Node *px) |
Adopt the sequence given in XML node. | |
void | setCis (int iaa) |
Set the peptide bond between iaa and iaa+1 to cis. | |
void | charged_end (bool ntcg, bool ctcg) |
Whether or not to use charged chain ends. | |
Accessing constituent properties | |
int | Id () const |
Each protein in the population has a unique integer id. | |
void | Id (int i) |
Assign integer id. | |
std::string | Sequence () const |
Retrieve sequence string. | |
OneLetterCode | Seq (int i) const |
Retreive OneLetterCode of the i'th residue. | |
int | numAminoAcids () const |
Number of amino acids. | |
int | numLigands () const |
Number of a.a.+ capping groups. | |
EndGroup * | NtermLigand () |
Get N terminal capping group. | |
EndGroup * | CtermLigand () |
Get C terminal capping group. | |
AminoAcid * | memberAA (int i) |
Pointer to i'th amino acid. | |
AminoAcid * | AA (int i) |
Pointer to i'th amino acid. | |
Ligand * | first_ligand () |
First ligand in chain. | |
Ligand * | last_ligand () |
Last ligand in chain. | |
int | begin_ligand () const |
Same as first_ligand. | |
int | end_ligand () const |
One past the last ligand. | |
AminoAcid * | first_AA () |
First amino acid in chain. | |
AminoAcid * | last_AA () |
Last amino acid in chain. | |
Ligand * | memberLigand (int i) |
i'th ligand, counting a.a. and capping groups | |
Backbone * | backbone () |
Pointer to the backbone. | |
Atom & | backbone_atom (int i) |
i'th backbone atom | |
const Atom & | backbone_atom (int i) const |
i'th backbone atom | |
int | numberOfAtoms () const |
Number of atoms. | |
int | begin_atom () const |
First atom in terms of unique ids. | |
int | end_atom () const |
One past the last element. | |
int | first_atom () const |
Same as begin. | |
int | last_atom () const |
Last atom. | |
Miscellaneous properties of the molecule | |
double | helix_fraction () |
double | strand_fraction () |
double | EndToEndDistance () const |
End-to-end length. | |
Vector3 | EndToEndVector () const |
End-to-end vector. | |
Vector3 | CenterOfMass () const |
Center of mass. | |
double | BackboneRg2 () const |
Squared backbone rd. of gyration. | |
Accessing and manipulating degrees of freedom | |
double | RamachandranPhi (int i) const |
Access Ramachandran phi angle of the i'th amino acid. | |
double | RamachandranPsi (int i) const |
Access Ramachandran psi angle of the i'th amino acid. | |
void | RamachandranPhi (int i, double xv) |
Assign to the Ramachandran phi angle of the i'th residue. | |
void | RamachandranPsi (int i, double xv) |
Assign to the i'th Ramachandran psi angle. | |
int | n_dof () const |
Number of (slightly redundant) degrees of freedom. | |
int | n_coords () const |
Number of generalised coordinates. | |
int | ConfSize () const |
Size in bytes of the data block in ProFASi binary conf file. | |
double | get_dof (int i) |
Get the value of the i'th true degree of freedom. | |
void | set_dof (int i, double vl) |
Set the value of the i'th true degree of freedom. | |
double | get_coord (int i) |
Get the value of the i'th coordinate. | |
void | set_coord (int i, double vl) |
Set the value of the i'th coordinate. | |
void | get_dof (std::vector< double > &vdof) const |
Retrieve all dof in an array. | |
void | get_coord (std::vector< double > &vcrd) const |
Retrieve all coordinates in an array. | |
void | set_dof (std::vector< double > &vdof) |
Set all dof from an array. | |
void | set_coord (std::vector< double > &vcrd) |
void | set_dof (std::vector< double >::iterator c1, std::vector< double >::iterator c2) |
void | set_coord (std::vector< double >::iterator c1, std::vector< double >::iterator c2) |
int | numBBdof () const |
Total number of backbone DOF. | |
int | numRTdof () const |
Total number of side chain DOF. | |
double | BBdof (int i) const |
Access i'th backbone DOF. | |
double | RTdof (int i) const |
Access i'th side chain DOF. | |
void | BBdof (int i, double val) |
Assign to i'th backbone DOF. | |
int | RTdof (int i, double val) |
Assign to i'th side chain DOF. | |
Ligand * | residue_with_rt_dof (int i) |
The residue which has the i'th side chain DOF. | |
Ligand * | residue_with_bb_dof (int i) |
The residue which has the i'th backbone DOF. | |
Writing structure in various formats | |
void | Write () |
Write down the molecule in plain text, including coordinates etc. | |
void | WritePDB (int &istatm, FILE *fp, char ch_id, int &rsindx) |
Write PDB with the normal order of atoms. | |
void | WritePDB2 (int &istatm, FILE *fp, char ch_id, int &rsindx) |
Write pdb with heavy atoms first for each amino acid. | |
void | Write_XML (FILE *op) |
Write an XML Node in the ProFASi XML structure format. | |
prf_xml::XML_Node * | make_xml_node () |
Retrieve XML_Node object with a map of the molecule. | |
std::string | ConfSignature () |
Signature of the data to be written in a binary conf file. | |
void | WriteConf (FILE *fp) |
Write current state in compressed binary format. | |
void | WriteConf_text (FILE *fp) |
Write bare minimum info about current state in plain text. | |
Reading in the structure | |
void | ReadConf (FILE *fp) |
Read in a block from a binary conf file. | |
void | ReadConf_text (FILE *fp) |
Read in coordinates from an open text file. | |
int | importXYZ (std::list< AtomRecord >::iterator start, std::list< AtomRecord >::iterator end, std::vector< bool > &assignments, int at_ligand=0) |
Copy Cartesian coordinates from a list of AtomRecord objects. | |
int | calc_torsions (std::vector< bool > &specified) |
Calculate all internal DOF from Cartesian coordinates. | |
int | guess_missing_coordinates (std::vector< bool > &specified) |
Guess the coordinates of a few atoms based on other atoms. | |
int | fix_broken_backbone (std::vector< bool > &specified) |
Guess appropriate coordinates for a section of the backbone. | |
int | assign_dof (prf_xml::XML_Node *px, int mismatch_strategy=0) |
Read in dof info from an XML node. | |
The Protein class in ProFASi represents a string of amino acids joined by peptide bonds. For the purposes of this package, the word protein is synonymous with a peptide or a poly-peptide chain. The molecule may contain non-amino acid capping groups (See EndGroup) at one or both ends of the chain.
Geometrically, the molecule is represented as a sequence of short trees atoms connected to a Backbone object. The Protein class provides information about the constituent parts and the degrees of freedom of the protein as well as providing functions for their construction and manipulation.
int Protein::add_aminoacid | ( | prf::OneLetterCode | cod | ) |
This function appends an amino acid of a given type at the end of the amino acid chain, on the C-terminal side. If there are end groups attached to the protein already, this function makes sure to shift the connection of the C-terminal end group appropriately. Sometimes it adjustd the number of atoms in the last amino acid before this> addition (if it had an OXT, it has to lose it). The information about the degrees of freedom and other stuff such as the arrays of hydrophobic and charged residues are updated.
This function changes the number of atoms in the protein. This will in general require reassignment of the UniqueIds of all atoms in the population, and a resizing of the atom coordinate storage. Atom coordinates are not owned by proteins or residues or ligands or even atoms!
int Protein::assign_dof | ( | prf_xml::XML_Node * | px, |
int | mismatch_strategy = 0 |
||
) |
Checks sequence against sequence stored in the XML node. If they match, it extracts the child nodes with tag "group" and assigns them to the appropriate residues. The nodes must have an attribute "index", which stores the serial number for that group in the protein sequence, counting from 0. It is possible to have only a few "group" nodes in the XML file. One may be interested in assigning coordinates to only a small block in the middle, for instace. The entire sequence string is available in the protein node of the XML file. The ligand indices are interpreted with respect to this entire protein sequence.
The optional flag "mismatch_strategy" tells the function what to do in case the sequence in the XML node does not match the sequence of the protein. Value 0 means, quit with an error message. This is the default. A value of 1 means, assign whenever possible.
int Protein::calc_torsions | ( | std::vector< bool > & | specified | ) |
This calculates the torsional degrees of freedom of the protein from the Cartesian coordinates of the atoms.
|
inline |
First, we note that whatever you set here is useless if there is a real end group specified. Second, if there are no end groups, by default, both ends are charged. So, if that is what you want, don't do anything. If you are not using capping groups and yet want neutral chain end/s, use this function to set which ends are uncharged, by passing a "false" for that end.
void Protein::clear | ( | ) |
Deletes all pointers owned by the protein in pointer arrays and then clears the arrays. Stored numbers such as the number of hydrophobic amino acids are zeroed. Essentially it cleans all information stored in a protein object, so that a new allocation process can proceed without trouble.
int Protein::fix_broken_backbone | ( | std::vector< bool > & | specified | ) |
This assumes that X,Y,Z coordinates were read for atoms from somewhere (e.g., PDB file), but there were a few residues where even the backbone atoms were not specified. This function should close the chain by assigning coordinates to the unspecified part of the chain such that the model constraints are satisfied and the coordinates of the atoms which have been specified are not touched.
int Protein::guess_missing_coordinates | ( | std::vector< bool > & | specified | ) |
In full generality, this function should try to guess the coordinates of some atoms if some of the atoms have been assigned proper coordinates. The input argument, a vector<bool> contains a list saying which coordinates are specified, which are not. The indices of the vector are supposed to be UniqueIds of the atoms. So, the size of the vector is the total number of atoms in the population. If, for instance, the position of all non-hydrogen atoms are specified, simple triangulation can be used to fill in the hydrogen coordinates as best as possible. When parts of a side chain are missing but the degrees of freedom can still be determined it is possible to calculate the position of the missing atoms. If the degrees of freedom can not be determined, they can be set to their nominal values so that the side chain would not look ridiculous if it was alone. But if the backbone atoms are missing, currently there is no remedy. So, this is intended to be used when a structure is imported from a PDB file or from a reduced model, where the hydrogen coordinates are not specified, and the side chain atoms may not all be given. A little energy minimisation after this should give a workable model approximation to the imported structure.
int Protein::importXYZ | ( | std::list< AtomRecord >::iterator | start, |
std::list< AtomRecord >::iterator | end, | ||
std::vector< bool > & | assignments, | ||
int | at_ligand = 0 |
||
) |
A typical use of this function would be, to make a selection of some part of a PDB file with a PDBReader object, and then pass the generated list of AtomRecords to the Protein object to give it the same structure. It does not try to adjust the geometry to the model geometry or fill in unspecified coordinates. This function can be called several times to assign coordinates to different parts of the chain.
Think of it as a list copy operation. The protein is like a list of atoms, and it imports entries from another list from within limits specified by the start and end iterators. The residue identifiers in the AtomRecord list are used only to identify blocks meant for one residue, i.e., their numerical values are ignored. The copying commences at a given location along the chain at_ligand
, and successive blocks in the AtomRecord list are assigned to successive residues. The assignments
is used to record which atoms were actually assigned to. It must be pre-initialized with a size equal to the number of atoms in the entire population. For each atom receiving coordinates through this function, the entry corresponding to its unique id is set to "true". The entries for the other atoms are not touched.
|
inline |
end_atom is like the end() function of standard library containers, one past the last element, to be used in loops with a "<" condition for termination. last_atom is really the last atom. Use it with "<=" terminating condition in loops. The integer return values are UniqueIds
int Protein::metamorphose | ( | prf_xml::XML_Node * | px | ) |
The protein checks if there is a sensible sequence given in the XML node. Checks if the sequence is already the same as its own. If it discovers a new sensible sequence in the XML Node, it discards its current sequence and allocates amino acids and capping groups as per the new sequence. The return value is 1 if something changed in the protein. This function does not recover or assign any degrees of freedom from the XML node. For that, use assign_dof().
|
inline |
This is similar to the n_dof function above, but here even the backbone torsion angles kept fixed in ProFASi are counted. For instance, the proline phi angle and the omega angles are kept fixed in ProFASi. So, for the sequence GPA, n_dof would return 9+5+1=15, where as this function would return, 9+9+1=19.
|
inline |
The return value is exactly 3 more than the true number of degrees of freedom for a protein in ProFASi. It is the sum of the number of backbone DOFs, side chain DOFs and 9 numbers to specify the overall position and orientation of the molecule. 6 numbers should be enough for the rigid body coordinates, but in ProFASi, we use 9, and that is the redundancy. The 9 numbers used are the Cartesian coordinates of the N, C_alpha and C' atoms of the first amino acid. Specifying these along with the torsional coordinates fixes the shape, position and orientation of the protein in space. Specifying the center of mass position and three Euler angles would suffice, but that involves more operations whenever these degrees of freedom need to be translated into Cartesian coordinates for all atoms.
void Protein::reconstruct | ( | int | idir, |
int | iaa | ||
) |
reconstruct starting from amino acid iaa, to the C terminus if idir is 0, to the N terminus if idir is 1. The rest of the chain is left unchanged.
int Protein::set_sequence | ( | const std::vector< prf::OneLetterCode > & | gseq | ) |
The end result of this function should be a properly initialized Protein object with a sequence as provided as an argument. If the required sequence differs from the actual sequence, the function "clear()" is called, followed by a sereis of calls to add_aminoacid and add_..._endgroup as required.
void Protein::stretched_init | ( | ) |
Backbone angles are put to (-pi,pi) throughout (with the exception of proline for which the backbone phi angle is fixed) and side chain angles are put to 0.