PROFASI
Version 1.5
|
A population of proteins. More...
#include <Population.hh>
Public Member Functions | |
Population () | |
Default constructor, creates an empty population. | |
void | RandomNumberGenerator (RandomNumberBase *) |
Specify a random number generator. | |
void | Reconstruct () |
Reconstruct population. | |
void | EnforceBC () |
Enforce periodic boundary conditions on all chains. | |
Adding molecules to the system | |
void | clear () |
Clear all chains. | |
int | AddProtein (std::string ntg, std::string sq, std::string ctg, int hwmny=1) |
Add proteins to the population. | |
int | AddProtein (std::list< SelRes > &lst, int hwmny=1) |
Add protein sequences to the population from a PDB file. | |
int | AddProtein (int hwmny, std::string pdbfilename) |
Add chains from a PDB file. | |
int | AddProtein (std::string fullseq, int hwmany=1) |
Add hwmany chains of a sequence described by fullseq. | |
int | assign_sequences (prf_xml::XML_Node *pnode) |
Assign only sequence info from an XML node. | |
void | setCis (int ich, int iaa) |
Set "cis"-peptide-bond between residue iaa and iaa+1 in chain ich. | |
void | charged_ends (bool b1, bool b2) |
Set up whether (un)charged chain ends are to be used. | |
Assigning 3D structure to members of the population | |
int | ImportStructure (std::list< AtomRecord > &rec, std::vector< bool > &assignments, int at_chain=0) |
Import structure from a list of AtomRecords. | |
int | guess_missing_coordinates (std::vector< bool > &assignments) |
Try to infer missing coordinates. | |
Reading in internal coordinates | |
int | Read_XML (prf_xml::XML_Node *pnode) |
Aggressively assign structure from an XML Node. | |
int | assign_structures (prf_xml::XML_Node *pnode) |
Assign coordinates from an XML node. | |
void | ReadConf (FILE *fp) |
Read compressed binary configuration data. | |
void | ReadConf_text (FILE *fp) |
Read raw configuration data in plain text format. | |
Initializing the population | |
void | Initialize (int inittyp=0) |
Allocate memory and create protein objects. | |
int | InitCoord (std::string init_type) |
Initialize coordinates with type specified by a string. | |
int | Init () |
int | re_index () |
int | index_dof () |
int | check_DOF_index () |
Check consistency of the DOF index. | |
bool | initialized () |
void | Randomize () |
Random values to all degrees of freedom, and reconstruct system. | |
void | RandomizeRelConf () |
Randomize leaving internal coordinates untouched. | |
void | RandomizeRelConf (int ich) |
Randomize by moving the chain number ich rigidly. | |
void | RandomizeRelConf (int ich, int jch) |
Randomize by moving the chains from ich to jch rigidly. | |
void | RandomizeIntConf () |
Randomize only the internal coordinates. | |
void | RandomizeIntConf (int ich) |
Randomize only the internal coordinates of chain ich. | |
void | RandomizeIntConf (int ich, int jch) |
Randomize only the internal coordinates of chains ich to jch. | |
Accessing constituents | |
Protein * | Chain (int i) |
Access i'th protein chain through a pointer. | |
Protein * | LongestChain () |
Access the longest sequence in the system by a pointer. | |
Protein * | ShortestChain () |
Access the protein with the shortest sequence in the system. | |
Ligand * | ligand (int i) |
i'th ligand in the system, including all proteins, capping groups.. | |
AminoAcid * | amino_acid (int i) |
i'th amino acid in the system, including all protein chains | |
std::string | PepName (int i) |
Name or sequence of i'th protein chain. | |
int | NSpecies () |
Number of different species of Proteins in the system. | |
Atom | atom (int i) const |
A copy of the i'th atom in the system. | |
AtomKind | SpeciesOf (int i) const |
Atom type information for the i'th atom. | |
int | NumberOfChains () const |
Total number of chains. | |
int | NumberOfResidues () const |
Total number of amino acids in all chains together. | |
int | NumberOfLigands () const |
Total number of ligands in all chains together. | |
int | NumberOfAtoms () const |
Total number of atoms. | |
int | num_grp (int i) const |
Number of residues in the i'th chain. | |
int | chain_start (int ich) |
Global index of first ligand of a chain. | |
int | chain_end (int ich) |
Index of one past the last ligand of i'th chain. | |
std::string | chain_name (int i) const |
Label of the i'th chain. | |
int | index_of_grp (std::string ires, int ich) |
Natural index of group labeled "ires" in the chain ich. | |
std::string | grp_name (int ires, int ic) |
Label of the group with natural index ires. | |
Ligand * | existing_group (int ires, int ic) |
Managing degrees of freedom | |
double | get_dof (size_t i) |
Get the i'th DOF in the system. | |
void | set_dof (size_t i, double vl) |
Set DOF i'th DOF value. | |
DOF_Info & | get_dof_info (size_t i) |
Get info on DOF with index i in the entire system. | |
DOF_Info & | get_dof_info (size_t ich, size_t i) |
Get info on DOF with index i within chain ich. | |
double | get_dof (DOF_Info &d) |
Get DOF value using a DOF_Info object as key. | |
void | set_dof (DOF_Info &d, double vl) |
Set DOF value using a DOF_Info object as key. | |
void | get_dof (std::vector< double > &dofary) |
Retrieve all "degrees of freedom" in a single array. | |
void | set_dof (std::vector< double > &dorary) |
Set all "degrees of freedom" from a given array. | |
int | n_dof () |
Number of coordinates from which the exact state can be restored. | |
std::vector< DOF_Info > & | dof_map () |
Reference to the map (vector of DOF_Info) of all DOF indexes. | |
int | get_dof_id (std::string dofstr) |
Interpret a string as a DOF identifier. | |
void | set_dof (std::string dofstr, double vl) |
Set DOF by interpreting DOF identifier and value from strings. | |
Writing structure in various formats | |
void | SaveSnapshot (int in_format, std::string flnm, unsigned long ittime, int tindex, double en) |
Write population in XML, pdb, binary or text conf format. | |
void | Write () |
Write down all the proteins in plain text. | |
void | WriteShort () |
Write some information about all the proteins. | |
void | WriteConf (FILE *fp) |
Write into binary configuration file. | |
void | WriteConf_text (FILE *fp) |
void | Write_XML (FILE *op) |
Write population info in an XML format. | |
prf_xml::XML_Node * | make_xml_node () |
Make an XML node object containing information on the population. | |
void | writePDBHeader (FILE *fp, unsigned long itime, int tindex, double entot) |
Write PDB header lines (SEQRES and such lines before ATOM lines) | |
void | writeSequenceInfo (FILE *fp) |
void | WritePDB (FILE *fp) |
Export PDB to file specified by a FILE pointer. | |
void | WritePDB2 (FILE *fp) |
Export PDB with heavy atoms first for each amino acid. | |
int | descriptors (std::list< SelRes > &slc, std::list< AtomDescriptor > &des) |
Return a list of all atom descriptors for a given selection. | |
int | export_descriptors (std::list< AtomDescriptor > &lst) |
Append the PDB Atom descriptor information to the end of the list. | |
int | export_shape (std::vector< int > &vct, Shape &shp) |
Make a Shape object out of the coordinates of specified atoms. | |
![]() | |
int | num_chains () const |
Number of chains. | |
int | get_model () const |
Currently selected model. | |
virtual void | set_model (int i) |
Select model i (makes sense only for PDB files) | |
virtual int | chain_number (std::string chnm) const |
Integer index (starting from 0) of the chain labeled chnm. | |
virtual std::string | str_index (int ires, int ich) |
String index of the chain with natural index ires. | |
int | mk_selection (std::string slcstr, std::list< SelRes > &slc) |
Create a selected list of residues using a selection string slcstr. | |
Population is a collection of one or more Proteins of one or more kinds. This is the class the conformational updates work on. This is the class the energy terms calculate energies for. Population provides a convenient interface to talk about "the system as a whole".
int prf::Population::AddProtein | ( | std::string | ntg, |
std::string | sq, | ||
std::string | ctg, | ||
int | hwmny = 1 |
||
) |
ntg | Name of the N terminal capping group like "Acetyl" |
sq | The amino acid sequence, like "GEWTYDDATKTFTVTE" |
ctg | Name of the C terminal capping group like "Amide" |
hwmny | Number of chains of the specified kind you want to add. It is alright to say "none" for the capping groups. It is alright to add several copies of one peptides and then several copies of another. |
int prf::Population::AddProtein | ( | std::list< SelRes > & | lst, |
int | hwmny = 1 |
||
) |
hwmny | Number of copies of the sequence to be added |
lst | A list of selected residues. The selections sould for instance, come from a PDB file using mk_selection function in PDBReader. SelRes objects contain chain information. So, if more than one chain is detected in the list, more than one chain will be added. If further, hwmny is greater than 1, each chain in lst will be added hwmny times. |
int prf::Population::AddProtein | ( | int | hwmny, |
std::string | pdbfilename | ||
) |
This is provided only for backward compatibility. The function AddProtein(int hwmny, std::list<SelRes> &lst) above should be preferred.
int prf::Population::AddProtein | ( | std::string | fullseq, |
int | hwmany = 1 |
||
) |
Introduced in version 1.1.0. The sequence description in fullseq includes the N- and C- terminal capping groups if they should be included. By default, the string is interpreted word for word, each word being translated into a residue or a capping group. It does not matter if you use single letter or 3-letter codes or full names in those words, if the words are separated by spaces. The read-mode toggles to-and-from character-mode if the "*" character is encountered. In the character-mode, each letter is interpreted as a one-letter symbol for an amino-acid. Examples:
fullseq="ALA ALA ALA" means alanine-alanine-alanine fullseq="ALA <em>ALA</em> ALA" means alanine-alanine-leucine-alanine-alanine
This function is useful if there is no good one-letter symbol for a group, like Acetyl, D-proline etc.
int prf::Population::assign_structures | ( | prf_xml::XML_Node * | pnode | ) |
Population can be assigned a structure from an XML node, for instance, as a part of the initialisation. The XML node must have a name population
, and it must have a few special child tags. There could be a series of child tags of name protein
with a node structure like in ProFASi's XML output structures. In addition, one can make assignments to any degree of freedom. This example should be clear enough:
<population> <dof_assignments> <dof id="::b:25"> 2.38972</dof> <dof id="::b:26"> -2.77682</dof> ... </dof_assignments> </population> *
The DOF id is is a string identifier for a degree of freedom. The syntax is described in ProFASi DOF identifier strings .
|
inline |
Returns the integer index of the first ligand of i'th chain in the vector of all ligands.
|
inline |
If this function is called with a false for N or C terminus, any chain for which no explicit end group is specified, gets a "VoidEG" for that end group. If an end group is specified, that is used. Note that "uncharged chain ends" does not mean NH2 at the N-terminus and COOH at the C-terminus. It just means that the terminal amino acids are created just like any other, with no extra atoms.
To use charged chain ends for un-capped sequences, this function does not ever need to be called. That is the default behaviour.
void prf::Population::get_dof | ( | std::vector< double > & | dofary | ) |
The degrees of freedom contain all torsional DOF from all chains. In addition, there is (slightly redundant) information on the rigid body coordinates. 6 DOF per chain would be sufficient. But reconstructing chains from such rigid body coordinates involves more steps than the redundant coordinates used in PROFASI, where the cartessian coordinates of the first 3 atoms of every backbone are stored: 9, instead of 6 rigid body coordinates. The layout in the array is coordinates of one chain followed by the other: i.e., information about one chain appears contiguously.
int prf::Population::get_dof_id | ( | std::string | dofstr | ) |
This function maps a ProFASi DOF identifier string to a unique integer global index for that degree of freedom. If the DOF can not be interpreted within the current population, -1 will be returned.
int prf::Population::guess_missing_coordinates | ( | std::vector< bool > & | assignments | ) |
The argument assignments
specifies which atoms have been assigned coordinates and which atoms not. This function tries to guess where those atoms with unknown coordinates should be put. In reality, there is not much action in this matter in the Population class. Here there is a loop over chains and the corresponding function for each chain is invoked. Guessing unspecified coordinates now only takes into account the known geometry of protein chains, and not the non-bonded interactions.
int prf::Population::ImportStructure | ( | std::list< AtomRecord > & | rec, |
std::vector< bool > & | assignments, | ||
int | at_chain = 0 |
||
) |
Takes a list of AtomRecords, rec
, possibly exported by a PDBReader or the Population at another time.
This function assigns coordinates given in a list of AtomRecords to the atoms in a popultion. It is useful to think of it as a list copy operation. The population is like a list, and the contents (coordinates) of another list (list of AtomRecords) is imported. The naming of chains in the list of AtomRecords is used only to separate blocks meant for different chains, i.e., the actual names of the chains are ignored. The chain specified by at_chain
(default value = 0) is used as the target of the first chain in the AtomRecord list.
The argument assignments
is a pre-allocated array of bool which is used to store which atoms were actually assigned to. It should be initialized elsewhere, so that it has the same size as the number of atoms in the population, and all entries should be initialized to false. Entries corresponding to atoms, which receive new coordinates through this function, are changed to "true". The other elements of the array are not touched, so that this function can be called many times to assign to different parts of the population. The final values in the assignments
array can be used to infer all the atoms which were assigned to.
int prf::Population::InitCoord | ( | std::string | init_type | ) |
Introduced in version 1.1.0. The argument init_type is a description of the initialization. It could be have the following values:
void prf::Population::Initialize | ( | int | inittyp = 0 | ) |
Initialize, by default creates the peptides with random values for all degrees of freedom. That's how they are created, and it is normally the desired starting condition in a simulation.
From version 1.0.1, one can optionally pass an argument "1", to create all proteins in the population in a "stretched out" state. In case there is more than one protein chain in the system, the relative position of chains will still be random. Further options to start from possible cristalline geometries in multi-chain systems are under consideration, and may be provided in the future for different values of the optional argument.
If a totally different starting condition is required, it can be arranged after the call to Initialize. Use the Chain(i) function to get a pointer to one chain. Then initialize each chain in whichever way you want. Finally, if you wish randomize the relative locations of different chains with the RandomizeRelConf series of functions.
|
virtual |
Overrides num_res function from PopBase.
Reimplemented from prf::PopBase.
int prf::Population::Read_XML | ( | prf_xml::XML_Node * | pnode | ) |
If the XML node contains more chain objects than are currently present in the population, new chains will be added. Then each chain is forced to adopt the sequence of the corresponding chain in the XML node. After this, the internal coordinates are read from the XML node and assigned to the protein chains.
void prf::Population::SaveSnapshot | ( | int | in_format, |
std::string | flnm, | ||
unsigned long | ittime, | ||
int | tindex, | ||
double | en | ||
) |
The state of a population can be written in many formats. There is the PDB format. But there are other formats preserving more information about the configuration. PROFASI has 3 such formats. The text and binary conf formats are trivial records of the degrees of freedom of one chain after the other. All numbers written are "double" values. These values can be read in later by the same population. The binary and text conf formats do not contain information about what chains were present in the population when the configuration was written out.
The preferred format is XML. It is more compact than the PDB format, as the PROFASI XMl files contain only the degrees of freedom (torsion angles), like the binary and textconf formats. But unlike those two, the XML format keeps sequence information, and is a self-contained record of the population. A population can be initialized from scratch using such a snapshot.
This function handles writing in all the above mentioned formats.
in_format | : 1 means PDB, 2 means XML, 3 means textconf, 4 means binary conf and 0 means write nothing. |
flnm,: | Name of the snapshot file |
ittime | : Some "time info", typically number of MC sweeps |
tindex | : A "temperature index" |
en,: | Energy. Note that the Population does not know anything about temperature, energy, and has no concept of any kind of MC time. It is useful to have such info in the snapshots, but such info must be provided to the population from outside. For backward compatibility, we do not write the MC time and temperature index in the text and binary configuration files created with this function. |
void prf::Population::set_dof | ( | std::vector< double > & | dorary | ) |
See clarification on "degrees of freedom" in the documentation of get_dof(std::vector<double> &dofary) above. The size of the array has to be correct. No checks are performed.
void prf::Population::Write_XML | ( | FILE * | op | ) |
The XML format contains both sequence and structure information for the chains. The population node only contains the number of chains, and a bunch of child nodes corresponding to the chains.