PROFASI  Version 1.5
Public Member Functions | List of all members
prf::Population Class Reference

A population of proteins. More...

#include <Population.hh>

Inheritance diagram for prf::Population:
Inheritance graph

Public Member Functions

 Population ()
 Default constructor, creates an empty population.
void RandomNumberGenerator (RandomNumberBase *)
 Specify a random number generator.
void Reconstruct ()
 Reconstruct population.
void EnforceBC ()
 Enforce periodic boundary conditions on all chains.
Adding molecules to the system
void clear ()
 Clear all chains.
int AddProtein (std::string ntg, std::string sq, std::string ctg, int hwmny=1)
 Add proteins to the population.
int AddProtein (std::list< SelRes > &lst, int hwmny=1)
 Add protein sequences to the population from a PDB file.
int AddProtein (int hwmny, std::string pdbfilename)
 Add chains from a PDB file.
int AddProtein (std::string fullseq, int hwmany=1)
 Add hwmany chains of a sequence described by fullseq.
int assign_sequences (prf_xml::XML_Node *pnode)
 Assign only sequence info from an XML node.
void setCis (int ich, int iaa)
 Set "cis"-peptide-bond between residue iaa and iaa+1 in chain ich.
void charged_ends (bool b1, bool b2)
 Set up whether (un)charged chain ends are to be used.
Assigning 3D structure to members of the population
int ImportStructure (std::list< AtomRecord > &rec, std::vector< bool > &assignments, int at_chain=0)
 Import structure from a list of AtomRecords.
int guess_missing_coordinates (std::vector< bool > &assignments)
 Try to infer missing coordinates.
Reading in internal coordinates
int Read_XML (prf_xml::XML_Node *pnode)
 Aggressively assign structure from an XML Node.
int assign_structures (prf_xml::XML_Node *pnode)
 Assign coordinates from an XML node.
void ReadConf (FILE *fp)
 Read compressed binary configuration data.
void ReadConf_text (FILE *fp)
 Read raw configuration data in plain text format.
Initializing the population
void Initialize (int inittyp=0)
 Allocate memory and create protein objects.
int InitCoord (std::string init_type)
 Initialize coordinates with type specified by a string.
int Init ()
int re_index ()
int index_dof ()
int check_DOF_index ()
 Check consistency of the DOF index.
bool initialized ()
void Randomize ()
 Random values to all degrees of freedom, and reconstruct system.
void RandomizeRelConf ()
 Randomize leaving internal coordinates untouched.
void RandomizeRelConf (int ich)
 Randomize by moving the chain number ich rigidly.
void RandomizeRelConf (int ich, int jch)
 Randomize by moving the chains from ich to jch rigidly.
void RandomizeIntConf ()
 Randomize only the internal coordinates.
void RandomizeIntConf (int ich)
 Randomize only the internal coordinates of chain ich.
void RandomizeIntConf (int ich, int jch)
 Randomize only the internal coordinates of chains ich to jch.
Accessing constituents
ProteinChain (int i)
 Access i'th protein chain through a pointer.
ProteinLongestChain ()
 Access the longest sequence in the system by a pointer.
ProteinShortestChain ()
 Access the protein with the shortest sequence in the system.
Ligandligand (int i)
 i'th ligand in the system, including all proteins, capping groups..
AminoAcidamino_acid (int i)
 i'th amino acid in the system, including all protein chains
std::string PepName (int i)
 Name or sequence of i'th protein chain.
int NSpecies ()
 Number of different species of Proteins in the system.
Atom atom (int i) const
 A copy of the i'th atom in the system.
AtomKind SpeciesOf (int i) const
 Atom type information for the i'th atom.
int NumberOfChains () const
 Total number of chains.
int NumberOfResidues () const
 Total number of amino acids in all chains together.
int NumberOfLigands () const
 Total number of ligands in all chains together.
int NumberOfAtoms () const
 Total number of atoms.
int num_grp (int i) const
 Number of residues in the i'th chain.
int chain_start (int ich)
 Global index of first ligand of a chain.
int chain_end (int ich)
 Index of one past the last ligand of i'th chain.
std::string chain_name (int i) const
 Label of the i'th chain.
int index_of_grp (std::string ires, int ich)
 Natural index of group labeled "ires" in the chain ich.
std::string grp_name (int ires, int ic)
 Label of the group with natural index ires.
Ligandexisting_group (int ires, int ic)
Managing degrees of freedom
double get_dof (size_t i)
 Get the i'th DOF in the system.
void set_dof (size_t i, double vl)
 Set DOF i'th DOF value.
DOF_Infoget_dof_info (size_t i)
 Get info on DOF with index i in the entire system.
DOF_Infoget_dof_info (size_t ich, size_t i)
 Get info on DOF with index i within chain ich.
double get_dof (DOF_Info &d)
 Get DOF value using a DOF_Info object as key.
void set_dof (DOF_Info &d, double vl)
 Set DOF value using a DOF_Info object as key.
void get_dof (std::vector< double > &dofary)
 Retrieve all "degrees of freedom" in a single array.
void set_dof (std::vector< double > &dorary)
 Set all "degrees of freedom" from a given array.
int n_dof ()
 Number of coordinates from which the exact state can be restored.
std::vector< DOF_Info > & dof_map ()
 Reference to the map (vector of DOF_Info) of all DOF indexes.
int get_dof_id (std::string dofstr)
 Interpret a string as a DOF identifier.
void set_dof (std::string dofstr, double vl)
 Set DOF by interpreting DOF identifier and value from strings.
Writing structure in various formats
void SaveSnapshot (int in_format, std::string flnm, unsigned long ittime, int tindex, double en)
 Write population in XML, pdb, binary or text conf format.
void Write ()
 Write down all the proteins in plain text.
void WriteShort ()
 Write some information about all the proteins.
void WriteConf (FILE *fp)
 Write into binary configuration file.
void WriteConf_text (FILE *fp)
void Write_XML (FILE *op)
 Write population info in an XML format.
prf_xml::XML_Nodemake_xml_node ()
 Make an XML node object containing information on the population.
void writePDBHeader (FILE *fp, unsigned long itime, int tindex, double entot)
 Write PDB header lines (SEQRES and such lines before ATOM lines)
void writeSequenceInfo (FILE *fp)
void WritePDB (FILE *fp)
 Export PDB to file specified by a FILE pointer.
void WritePDB2 (FILE *fp)
 Export PDB with heavy atoms first for each amino acid.
int descriptors (std::list< SelRes > &slc, std::list< AtomDescriptor > &des)
 Return a list of all atom descriptors for a given selection.
int export_descriptors (std::list< AtomDescriptor > &lst)
 Append the PDB Atom descriptor information to the end of the list.
int export_shape (std::vector< int > &vct, Shape &shp)
 Make a Shape object out of the coordinates of specified atoms.
- Public Member Functions inherited from prf::PopBase
int num_chains () const
 Number of chains.
int get_model () const
 Currently selected model.
virtual void set_model (int i)
 Select model i (makes sense only for PDB files)
virtual int chain_number (std::string chnm) const
 Integer index (starting from 0) of the chain labeled chnm.
virtual std::string str_index (int ires, int ich)
 String index of the chain with natural index ires.
int mk_selection (std::string slcstr, std::list< SelRes > &slc)
 Create a selected list of residues using a selection string slcstr.

Detailed Description

Population is a collection of one or more Proteins of one or more kinds. This is the class the conformational updates work on. This is the class the energy terms calculate energies for. Population provides a convenient interface to talk about "the system as a whole".

Member Function Documentation

int prf::Population::AddProtein ( std::string  ntg,
std::string  sq,
std::string  ctg,
int  hwmny = 1 
ntgName of the N terminal capping group like "Acetyl"
sqThe amino acid sequence, like "GEWTYDDATKTFTVTE"
ctgName of the C terminal capping group like "Amide"
hwmnyNumber of chains of the specified kind you want to add. It is alright to say "none" for the capping groups. It is alright to add several copies of one peptides and then several copies of another.
int prf::Population::AddProtein ( std::list< SelRes > &  lst,
int  hwmny = 1 
hwmnyNumber of copies of the sequence to be added
lstA list of selected residues. The selections sould for instance, come from a PDB file using mk_selection function in PDBReader. SelRes objects contain chain information. So, if more than one chain is detected in the list, more than one chain will be added. If further, hwmny is greater than 1, each chain in lst will be added hwmny times.
int prf::Population::AddProtein ( int  hwmny,
std::string  pdbfilename 

This is provided only for backward compatibility. The function AddProtein(int hwmny, std::list<SelRes> &lst) above should be preferred.

int prf::Population::AddProtein ( std::string  fullseq,
int  hwmany = 1 

Introduced in version 1.1.0. The sequence description in fullseq includes the N- and C- terminal capping groups if they should be included. By default, the string is interpreted word for word, each word being translated into a residue or a capping group. It does not matter if you use single letter or 3-letter codes or full names in those words, if the words are separated by spaces. The read-mode toggles to-and-from character-mode if the "*" character is encountered. In the character-mode, each letter is interpreted as a one-letter symbol for an amino-acid. Examples:
fullseq="ALA ALA ALA" means alanine-alanine-alanine fullseq="ALA <em>ALA</em> ALA" means alanine-alanine-leucine-alanine-alanine

This function is useful if there is no good one-letter symbol for a group, like Acetyl, D-proline etc.

int prf::Population::assign_structures ( prf_xml::XML_Node pnode)

Population can be assigned a structure from an XML node, for instance, as a part of the initialisation. The XML node must have a name population, and it must have a few special child tags. There could be a series of child tags of name protein with a node structure like in ProFASi's XML output structures. In addition, one can make assignments to any degree of freedom. This example should be clear enough:

      <dof id="::b:25"> 2.38972</dof>
      <dof id="::b:26"> -2.77682</dof>

The DOF id is is a string identifier for a degree of freedom. The syntax is described in ProFASi DOF identifier strings .

int prf::Population::chain_start ( int  ich)

Returns the integer index of the first ligand of i'th chain in the vector of all ligands.

void prf::Population::charged_ends ( bool  b1,
bool  b2 

If this function is called with a false for N or C terminus, any chain for which no explicit end group is specified, gets a "VoidEG" for that end group. If an end group is specified, that is used. Note that "uncharged chain ends" does not mean NH2 at the N-terminus and COOH at the C-terminus. It just means that the terminal amino acids are created just like any other, with no extra atoms.

To use charged chain ends for un-capped sequences, this function does not ever need to be called. That is the default behaviour.

void prf::Population::get_dof ( std::vector< double > &  dofary)

The degrees of freedom contain all torsional DOF from all chains. In addition, there is (slightly redundant) information on the rigid body coordinates. 6 DOF per chain would be sufficient. But reconstructing chains from such rigid body coordinates involves more steps than the redundant coordinates used in PROFASI, where the cartessian coordinates of the first 3 atoms of every backbone are stored: 9, instead of 6 rigid body coordinates. The layout in the array is coordinates of one chain followed by the other: i.e., information about one chain appears contiguously.

int prf::Population::get_dof_id ( std::string  dofstr)

This function maps a ProFASi DOF identifier string to a unique integer global index for that degree of freedom. If the DOF can not be interpreted within the current population, -1 will be returned.

See Also
int prf::Population::guess_missing_coordinates ( std::vector< bool > &  assignments)

The argument assignments specifies which atoms have been assigned coordinates and which atoms not. This function tries to guess where those atoms with unknown coordinates should be put. In reality, there is not much action in this matter in the Population class. Here there is a loop over chains and the corresponding function for each chain is invoked. Guessing unspecified coordinates now only takes into account the known geometry of protein chains, and not the non-bonded interactions.

int prf::Population::ImportStructure ( std::list< AtomRecord > &  rec,
std::vector< bool > &  assignments,
int  at_chain = 0 

Takes a list of AtomRecords, rec, possibly exported by a PDBReader or the Population at another time.

This function assigns coordinates given in a list of AtomRecords to the atoms in a popultion. It is useful to think of it as a list copy operation. The population is like a list, and the contents (coordinates) of another list (list of AtomRecords) is imported. The naming of chains in the list of AtomRecords is used only to separate blocks meant for different chains, i.e., the actual names of the chains are ignored. The chain specified by at_chain (default value = 0) is used as the target of the first chain in the AtomRecord list.

The argument assignments is a pre-allocated array of bool which is used to store which atoms were actually assigned to. It should be initialized elsewhere, so that it has the same size as the number of atoms in the population, and all entries should be initialized to false. Entries corresponding to atoms, which receive new coordinates through this function, are changed to "true". The other elements of the array are not touched, so that this function can be called many times to assign to different parts of the population. The final values in the assignments array can be used to infer all the atoms which were assigned to.

int prf::Population::InitCoord ( std::string  init_type)

Introduced in version 1.1.0. The argument init_type is a description of the initialization. It could be have the following values:

  • "random" : random values to all degrees of freedom
  • "stretched" : streched chains.
  • "stretched random_rel" : stretched chains, but with random relative positions and orientations.
  • "file:somefile" : read in degrees of freedom from a text configuration file
void prf::Population::Initialize ( int  inittyp = 0)

Initialize, by default creates the peptides with random values for all degrees of freedom. That's how they are created, and it is normally the desired starting condition in a simulation.

From version 1.0.1, one can optionally pass an argument "1", to create all proteins in the population in a "stretched out" state. In case there is more than one protein chain in the system, the relative position of chains will still be random. Further options to start from possible cristalline geometries in multi-chain systems are under consideration, and may be provided in the future for different values of the optional argument.

If a totally different starting condition is required, it can be arranged after the call to Initialize. Use the Chain(i) function to get a pointer to one chain. Then initialize each chain in whichever way you want. Finally, if you wish randomize the relative locations of different chains with the RandomizeRelConf series of functions.

int prf::Population::num_grp ( int  i) const

Overrides num_res function from PopBase.

Reimplemented from prf::PopBase.

int prf::Population::Read_XML ( prf_xml::XML_Node pnode)

If the XML node contains more chain objects than are currently present in the population, new chains will be added. Then each chain is forced to adopt the sequence of the corresponding chain in the XML node. After this, the internal coordinates are read from the XML node and assigned to the protein chains.

void prf::Population::SaveSnapshot ( int  in_format,
std::string  flnm,
unsigned long  ittime,
int  tindex,
double  en 

The state of a population can be written in many formats. There is the PDB format. But there are other formats preserving more information about the configuration. PROFASI has 3 such formats. The text and binary conf formats are trivial records of the degrees of freedom of one chain after the other. All numbers written are "double" values. These values can be read in later by the same population. The binary and text conf formats do not contain information about what chains were present in the population when the configuration was written out.

The preferred format is XML. It is more compact than the PDB format, as the PROFASI XMl files contain only the degrees of freedom (torsion angles), like the binary and textconf formats. But unlike those two, the XML format keeps sequence information, and is a self-contained record of the population. A population can be initialized from scratch using such a snapshot.

This function handles writing in all the above mentioned formats.

in_format: 1 means PDB, 2 means XML, 3 means textconf, 4 means binary conf and 0 means write nothing.
flnm,:Name of the snapshot file
ittime: Some "time info", typically number of MC sweeps
tindex: A "temperature index"
en,:Energy. Note that the Population does not know anything about temperature, energy, and has no concept of any kind of MC time. It is useful to have such info in the snapshots, but such info must be provided to the population from outside. For backward compatibility, we do not write the MC time and temperature index in the text and binary configuration files created with this function.
void prf::Population::set_dof ( std::vector< double > &  dorary)

See clarification on "degrees of freedom" in the documentation of get_dof(std::vector<double> &dofary) above. The size of the array has to be correct. No checks are performed.

void prf::Population::Write_XML ( FILE *  op)

The XML format contains both sequence and structure information for the chains. The population node only contains the number of chains, and a bunch of child nodes corresponding to the chains.

The documentation for this class was generated from the following files:

PROFASI: Protein Folding and Aggregation Simulator, Version 1.5
© (2005-2016) Anders Irbäck and Sandipan Mohanty
Documentation generated on Mon Jul 18 2016 using Doxygen version 1.8.2