PROFASI  Version 1.5
Public Member Functions | List of all members
prf::Protein Class Reference

Representation for a poly peptide chain. More...

#include <Protein.hh>

Public Member Functions

 Protein ()
 Default constructor.
 
 Protein (std::string aaseq)
 Constructor taking an amino acid sequence.
 
 Protein (const Protein &)
 Copy constructor.
 
Proteinoperator= (const Protein &)
 Assignment operator.
 
void translate (const Vector3 &trv)
 Translate molecule.
 
void rotate (double angl, const Vector3 &axs)
 Rotate molecule about an axis passing through the center of mass.
 
void reconstruct ()
 Recalculate cartessian coordinates from internal.
 
void reconstruct (int idir, int iaa)
 Reconstruct given a starting point and a direction.
 
void EnforceBC ()
 Bring all coordinates to their appropriate periodic intervals.
 
Initialisation related functions
void clear ()
 Clears all arrays and resets scalars to defaults.
 
int add_aminoacid (prf::OneLetterCode cod)
 Increment amino acid sequence with one residue at the C-terminal.
 
int add_NTLigand (prf::OneLetterCode cod)
 Add N-terminal capping group.
 
int add_CTLigand (prf::OneLetterCode cod)
 Add C-terminal capping group.
 
int set_sequence (const std::vector< prf::OneLetterCode > &gseq)
 Set a new sequence and allocate memory accordingly.
 
void connect_residues ()
 
void init_atoms ()
 
void create_backbone ()
 
void init_dof ()
 Initialize DOF arrays.
 
void trivial_init ()
 All internal coordinates to zero.
 
void stretched_init ()
 Initialize to stretched out state.
 
void random_init (RandomNumberBase *rangen)
 all internal coordinates to random values
 
void randomize (RandomNumberBase *rangen)
 
void helical_init ()
 ideal alpha helix for backbone, zero for rotamers
 
int metamorphose (prf_xml::XML_Node *px)
 Adopt the sequence given in XML node.
 
void setCis (int iaa)
 Set the peptide bond between iaa and iaa+1 to cis.
 
void charged_end (bool ntcg, bool ctcg)
 Whether or not to use charged chain ends.
 
Accessing constituent properties
int Id () const
 Each protein in the population has a unique integer id.
 
void Id (int i)
 Assign integer id.
 
std::string Sequence () const
 Retrieve sequence string.
 
OneLetterCode Seq (int i) const
 Retreive OneLetterCode of the i'th residue.
 
int numAminoAcids () const
 Number of amino acids.
 
int numLigands () const
 Number of a.a.+ capping groups.
 
EndGroup * NtermLigand ()
 Get N terminal capping group.
 
EndGroup * CtermLigand ()
 Get C terminal capping group.
 
AminoAcidmemberAA (int i)
 Pointer to i'th amino acid.
 
AminoAcidAA (int i)
 Pointer to i'th amino acid.
 
Ligandfirst_ligand ()
 First ligand in chain.
 
Ligandlast_ligand ()
 Last ligand in chain.
 
int begin_ligand () const
 Same as first_ligand.
 
int end_ligand () const
 One past the last ligand.
 
AminoAcidfirst_AA ()
 First amino acid in chain.
 
AminoAcidlast_AA ()
 Last amino acid in chain.
 
LigandmemberLigand (int i)
 i'th ligand, counting a.a. and capping groups
 
Backbonebackbone ()
 Pointer to the backbone.
 
Atombackbone_atom (int i)
 i'th backbone atom
 
const Atombackbone_atom (int i) const
 i'th backbone atom
 
int numberOfAtoms () const
 Number of atoms.
 
int begin_atom () const
 First atom in terms of unique ids.
 
int end_atom () const
 One past the last element.
 
int first_atom () const
 Same as begin.
 
int last_atom () const
 Last atom.
 
Miscellaneous properties of the molecule
double helix_fraction ()
 
double strand_fraction ()
 
double EndToEndDistance () const
 End-to-end length.
 
Vector3 EndToEndVector () const
 End-to-end vector.
 
Vector3 CenterOfMass () const
 Center of mass.
 
double BackboneRg2 () const
 Squared backbone rd. of gyration.
 
Accessing and manipulating degrees of freedom
double RamachandranPhi (int i) const
 Access Ramachandran phi angle of the i'th amino acid.
 
double RamachandranPsi (int i) const
 Access Ramachandran psi angle of the i'th amino acid.
 
void RamachandranPhi (int i, double xv)
 Assign to the Ramachandran phi angle of the i'th residue.
 
void RamachandranPsi (int i, double xv)
 Assign to the i'th Ramachandran psi angle.
 
int n_dof () const
 Number of (slightly redundant) degrees of freedom.
 
int n_coords () const
 Number of generalised coordinates.
 
int ConfSize () const
 Size in bytes of the data block in ProFASi binary conf file.
 
double get_dof (int i)
 Get the value of the i'th true degree of freedom.
 
void set_dof (int i, double vl)
 Set the value of the i'th true degree of freedom.
 
double get_coord (int i)
 Get the value of the i'th coordinate.
 
void set_coord (int i, double vl)
 Set the value of the i'th coordinate.
 
void get_dof (std::vector< double > &vdof) const
 Retrieve all dof in an array.
 
void get_coord (std::vector< double > &vcrd) const
 Retrieve all coordinates in an array.
 
void set_dof (std::vector< double > &vdof)
 Set all dof from an array.
 
void set_coord (std::vector< double > &vcrd)
 
void set_dof (std::vector< double >::iterator c1, std::vector< double >::iterator c2)
 
void set_coord (std::vector< double >::iterator c1, std::vector< double >::iterator c2)
 
int numBBdof () const
 Total number of backbone DOF.
 
int numRTdof () const
 Total number of side chain DOF.
 
double BBdof (int i) const
 Access i'th backbone DOF.
 
double RTdof (int i) const
 Access i'th side chain DOF.
 
void BBdof (int i, double val)
 Assign to i'th backbone DOF.
 
int RTdof (int i, double val)
 Assign to i'th side chain DOF.
 
Ligandresidue_with_rt_dof (int i)
 The residue which has the i'th side chain DOF.
 
Ligandresidue_with_bb_dof (int i)
 The residue which has the i'th backbone DOF.
 
Writing structure in various formats
void Write ()
 Write down the molecule in plain text, including coordinates etc.
 
void WritePDB (int &istatm, FILE *fp, char ch_id, int &rsindx)
 Write PDB with the normal order of atoms.
 
void WritePDB2 (int &istatm, FILE *fp, char ch_id, int &rsindx)
 Write pdb with heavy atoms first for each amino acid.
 
void Write_XML (FILE *op)
 Write an XML Node in the ProFASi XML structure format.
 
prf_xml::XML_Nodemake_xml_node ()
 Retrieve XML_Node object with a map of the molecule.
 
std::string ConfSignature ()
 Signature of the data to be written in a binary conf file.
 
void WriteConf (FILE *fp)
 Write current state in compressed binary format.
 
void WriteConf_text (FILE *fp)
 Write bare minimum info about current state in plain text.
 
Reading in the structure
void ReadConf (FILE *fp)
 Read in a block from a binary conf file.
 
void ReadConf_text (FILE *fp)
 Read in coordinates from an open text file.
 
int importXYZ (std::list< AtomRecord >::iterator start, std::list< AtomRecord >::iterator end, std::vector< bool > &assignments, int at_ligand=0)
 Copy Cartesian coordinates from a list of AtomRecord objects.
 
int calc_torsions (std::vector< bool > &specified)
 Calculate all internal DOF from Cartesian coordinates.
 
int guess_missing_coordinates (std::vector< bool > &specified)
 Guess the coordinates of a few atoms based on other atoms.
 
int fix_broken_backbone (std::vector< bool > &specified)
 Guess appropriate coordinates for a section of the backbone.
 
int assign_dof (prf_xml::XML_Node *px, int mismatch_strategy=0)
 Read in dof info from an XML node.
 

Detailed Description

The Protein class in ProFASi represents a string of amino acids joined by peptide bonds. For the purposes of this package, the word protein is synonymous with a peptide or a poly-peptide chain. The molecule may contain non-amino acid capping groups (See EndGroup) at one or both ends of the chain.

Geometrically, the molecule is represented as a sequence of short trees atoms connected to a Backbone object. The Protein class provides information about the constituent parts and the degrees of freedom of the protein as well as providing functions for their construction and manipulation.

Member Function Documentation

int Protein::add_aminoacid ( prf::OneLetterCode  cod)

This function appends an amino acid of a given type at the end of the amino acid chain, on the C-terminal side. If there are end groups attached to the protein already, this function makes sure to shift the connection of the C-terminal end group appropriately. Sometimes it adjustd the number of atoms in the last amino acid before this> addition (if it had an OXT, it has to lose it). The information about the degrees of freedom and other stuff such as the arrays of hydrophobic and charged residues are updated.

This function changes the number of atoms in the protein. This will in general require reassignment of the UniqueIds of all atoms in the population, and a resizing of the atom coordinate storage. Atom coordinates are not owned by proteins or residues or ligands or even atoms!

int Protein::assign_dof ( prf_xml::XML_Node px,
int  mismatch_strategy = 0 
)

Checks sequence against sequence stored in the XML node. If they match, it extracts the child nodes with tag "group" and assigns them to the appropriate residues. The nodes must have an attribute "index", which stores the serial number for that group in the protein sequence, counting from 0. It is possible to have only a few "group" nodes in the XML file. One may be interested in assigning coordinates to only a small block in the middle, for instace. The entire sequence string is available in the protein node of the XML file. The ligand indices are interpreted with respect to this entire protein sequence.

The optional flag "mismatch_strategy" tells the function what to do in case the sequence in the XML node does not match the sequence of the protein. Value 0 means, quit with an error message. This is the default. A value of 1 means, assign whenever possible.

int Protein::calc_torsions ( std::vector< bool > &  specified)

This calculates the torsional degrees of freedom of the protein from the Cartesian coordinates of the atoms.

void prf::Protein::charged_end ( bool  ntcg,
bool  ctcg 
)
inline

First, we note that whatever you set here is useless if there is a real end group specified. Second, if there are no end groups, by default, both ends are charged. So, if that is what you want, don't do anything. If you are not using capping groups and yet want neutral chain end/s, use this function to set which ends are uncharged, by passing a "false" for that end.

void Protein::clear ( )

Deletes all pointers owned by the protein in pointer arrays and then clears the arrays. Stored numbers such as the number of hydrophobic amino acids are zeroed. Essentially it cleans all information stored in a protein object, so that a new allocation process can proceed without trouble.

int Protein::fix_broken_backbone ( std::vector< bool > &  specified)

This assumes that X,Y,Z coordinates were read for atoms from somewhere (e.g., PDB file), but there were a few residues where even the backbone atoms were not specified. This function should close the chain by assigning coordinates to the unspecified part of the chain such that the model constraints are satisfied and the coordinates of the atoms which have been specified are not touched.

Note
At present this is a dummy. No algorithm to fix broken backbone is implemented in reality.
int Protein::guess_missing_coordinates ( std::vector< bool > &  specified)

In full generality, this function should try to guess the coordinates of some atoms if some of the atoms have been assigned proper coordinates. The input argument, a vector<bool> contains a list saying which coordinates are specified, which are not. The indices of the vector are supposed to be UniqueIds of the atoms. So, the size of the vector is the total number of atoms in the population. If, for instance, the position of all non-hydrogen atoms are specified, simple triangulation can be used to fill in the hydrogen coordinates as best as possible. When parts of a side chain are missing but the degrees of freedom can still be determined it is possible to calculate the position of the missing atoms. If the degrees of freedom can not be determined, they can be set to their nominal values so that the side chain would not look ridiculous if it was alone. But if the backbone atoms are missing, currently there is no remedy. So, this is intended to be used when a structure is imported from a PDB file or from a reduced model, where the hydrogen coordinates are not specified, and the side chain atoms may not all be given. A little energy minimisation after this should give a workable model approximation to the imported structure.

int Protein::importXYZ ( std::list< AtomRecord >::iterator  start,
std::list< AtomRecord >::iterator  end,
std::vector< bool > &  assignments,
int  at_ligand = 0 
)

A typical use of this function would be, to make a selection of some part of a PDB file with a PDBReader object, and then pass the generated list of AtomRecords to the Protein object to give it the same structure. It does not try to adjust the geometry to the model geometry or fill in unspecified coordinates. This function can be called several times to assign coordinates to different parts of the chain.

Think of it as a list copy operation. The protein is like a list of atoms, and it imports entries from another list from within limits specified by the start and end iterators. The residue identifiers in the AtomRecord list are used only to identify blocks meant for one residue, i.e., their numerical values are ignored. The copying commences at a given location along the chain at_ligand, and successive blocks in the AtomRecord list are assigned to successive residues. The assignments is used to record which atoms were actually assigned to. It must be pre-initialized with a size equal to the number of atoms in the entire population. For each atom receiving coordinates through this function, the entry corresponding to its unique id is set to "true". The entries for the other atoms are not touched.

int prf::Protein::last_atom ( ) const
inline

end_atom is like the end() function of standard library containers, one past the last element, to be used in loops with a "<" condition for termination. last_atom is really the last atom. Use it with "<=" terminating condition in loops. The integer return values are UniqueIds

int Protein::metamorphose ( prf_xml::XML_Node px)

The protein checks if there is a sensible sequence given in the XML node. Checks if the sequence is already the same as its own. If it discovers a new sensible sequence in the XML Node, it discards its current sequence and allocates amino acids and capping groups as per the new sequence. The return value is 1 if something changed in the protein. This function does not recover or assign any degrees of freedom from the XML node. For that, use assign_dof().

int prf::Protein::n_coords ( ) const
inline

This is similar to the n_dof function above, but here even the backbone torsion angles kept fixed in ProFASi are counted. For instance, the proline phi angle and the omega angles are kept fixed in ProFASi. So, for the sequence GPA, n_dof would return 9+5+1=15, where as this function would return, 9+9+1=19.

int prf::Protein::n_dof ( ) const
inline

The return value is exactly 3 more than the true number of degrees of freedom for a protein in ProFASi. It is the sum of the number of backbone DOFs, side chain DOFs and 9 numbers to specify the overall position and orientation of the molecule. 6 numbers should be enough for the rigid body coordinates, but in ProFASi, we use 9, and that is the redundancy. The 9 numbers used are the Cartesian coordinates of the N, C_alpha and C' atoms of the first amino acid. Specifying these along with the torsional coordinates fixes the shape, position and orientation of the protein in space. Specifying the center of mass position and three Euler angles would suffice, but that involves more operations whenever these degrees of freedom need to be translated into Cartesian coordinates for all atoms.

void Protein::reconstruct ( int  idir,
int  iaa 
)

reconstruct starting from amino acid iaa, to the C terminus if idir is 0, to the N terminus if idir is 1. The rest of the chain is left unchanged.

int Protein::set_sequence ( const std::vector< prf::OneLetterCode > &  gseq)

The end result of this function should be a properly initialized Protein object with a sequence as provided as an argument. If the required sequence differs from the actual sequence, the function "clear()" is called, followed by a sereis of calls to add_aminoacid and add_..._endgroup as required.

void Protein::stretched_init ( )

Backbone angles are put to (-pi,pi) throughout (with the exception of proline for which the backbone phi angle is fixed) and side chain angles are put to 0.


The documentation for this class was generated from the following files:

PROFASI: Protein Folding and Aggregation Simulator, Version 1.5
© (2005-2016) Anders Irbäck and Sandipan Mohanty
Documentation generated on Mon Jul 18 2016 using Doxygen version 1.8.2