11th Swedish Bioinformatics Workshop
for PhD students and Postdocs
29–30 September 2011.
Swedish Bioinformatics Workshop Logo
Dept. of Astronomy and Theoretical Physics and Dept. of Biology, Lund University

Contact info

[Registration closed]
How to get here

Information for:

Submitted abstracts:

Social events


Accepted abstracts for oral presentations

  1. Richard Bonneau
    New York University
    Learning hematopoetic differentiation networks in human and mouse integrated data-sets with focus on Th17 differentiation and function
    The talk will focus specifically on an integrated analysis of ChIP-seq of key epigenetic marks and immune-relevant transcription factors, RNA-seq time series following T-cell differentiation, and microarray data spanning several white blood cell lineages. Much of this data has been collected as part of a large consortia investigating diverse aspects of the human and mouse immune systems (NYU, Penn, the Broad/ImmGen, Hudson Alpha, and others). We discuss practical considerations for matching experimental designs to our inference pipeline. Our methods allows us to integrate human and mouse data-sets to both improve the accuracy of our estimation of conserved regulatory modules and highlight species specific regulatory modules. New developments include: 1) a multiple-species (or comparative) version of the cMonkey biclustering algorithms (used to find conserved and species specific modules in an integrated mouse-human data-set) and 2) a new inference pipeline that explicitly integrates epigenetic marks, chip-seq, genetic perturbations, microarray and RNA-seq data.
  2. Erica Manesso
    Department of Astronomy and Theoretical Physics
    Lund University
    Dynamics inside the hematopoietic hierarchy in adult mice
    Bone marrow hematopoietic stem cells are responsible for both daily preservation of all blood cell types and for repair after hematopoietic injuries. Substantial efforts have been made to identify the hierarchy of progenitors for all blood cell lineages, yet little is known about the dynamics inside the entire hematopoietic tree. To this end, we developed a dynamic model for the hematopoietic hierarchy for the most important and specific lineages of the tree. In normal conditions, compartment sizes, commitment and net division rates were identified in order to meet the daily production of blood cells, that is circa 3X10^8 cells/day. Furthermore, the implementation of ad hoc feed-backs from differentiated cells to progenitors and from progenitors to hematopoietic stem cells allowed the simulation of common injuries, like hemorrhage and irradiation followed by bone marrow transplantation. In detail, the dynamic model was able to reproduce the expected 2 week-recovery time from a 10% blood loss as well as the need of transplanting myeloid progenitors together with hematopoietic stem cells to protect from anemia and thrombocytopenia following irradiation.
    This work was supported by the Swedish Foundation for Strategic Research (SSF).
  3. Dirk Repsilber
    Genetics and Biometry
    Leibniz Institute for Farm Animal Biology
    Biosignatures from blood: disentangling patterns and cell types in heterogeneous tissue
    Screening for predictive biosignatures using statistical learning in heterogeneous tissues -- with blood as prominent example -- is hampered by confounding of molecular profiles with cell type proportions. Non-negative matrix hybridization approaches have been proposed for in-silico deconfounding. However, sample variation in molecular profiles is not retained in these approaches. Therefore, statistical learning methods are not eligible to apply.
    Two different possible ways out of this dilemma are presented, and applied to experimental validation data in blood PBMCs together with FACS count data and profiles from sorted cells.
  4. Christof Winter
    Lund University
    Improving outcome prediction for cancer patients by network-based ranking of marker genes
    Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. State of the art methods used so far, however, often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google’s PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. The amount of genomic data of individual tumors will grow rapidly in the future. Our algorithm meets the need for powerful computational approaches that will be key to exploit these data for personalized cancer therapies in clinical practice.
  5. Fredrik Boulund
    Mathematical Sciences, Division of Mathematical Statistics
    Chalmers University of Technology
    A computational pipeline employing hidden markov models for the identification of novel antibiotic resistance genes
    Fluoroquinolones are an important family of broad spectrum antibiotics. Bacterial resistance to fluoroquinolones has recently been discovered through a class of mobile resistance genes called qnr. Currently, there are five known classes of plasmid mediated qnr-genes, though this is believed to be only a fraction of the true number of gene variants. The Qnr proteins are pentapeptide repeat proteins that contain a specific repeating pattern of five amino acid residues in their sequence. Using this unique structural feature we created a hidden Markov model (HMM) based on the sequences of all currently known plasmid mediated variants. To enable identification of novel qnr-like genes or classes, we developed a computational pipeline to search large data sets using the model. Performance evaluation of the pipeline accuracy showed that the power for detecting a novel class of genes is more than 99% for input sequences as short as 100 nucleotides. We applied the pipeline to several data sets, both annotated (e.g. NCBI GenBank) and metagenomic sequences produced with high-throughput sequencing technologies (e.g. CAMERA, Meta-Hit). In this project, we searched over 470 million sequences, totaling more than 700 gigabytes of amino acid sequences. The method identified all previously known qnr-genes, both plasmid mediated and chromosomal, as well as several novel candidates from both categories. In addition, we discovered several sequences in the current databases that were misannotated. We also present a refined model incorporating the diversity of the novel qnr-genes for further use in related research.
  6. Anna Johnning
    Institute of Neuroscience and Physiology
    University of Gothenburg
    The genome of an extensively drug-resistant bacterium
    Antibiotics save millions of lives every year and are crucial for fighting disease worldwide. However, their extensive usage also promotes antibiotic resistance – one of the most important challenges for the health care sector. As bacteria can move between environments and resistance genes can be transferred horizontally between bacterial species, it is important to protect the environmental bacteria from excessive antibiotic exposure. We have sequenced the genome of the extensively drug- resistant Ochrobactrum intermedium strain CCUG 57381, an environmental bacterium and opportunistic pathogen. The strain was isolated from a sample taken inside a treatment plant in India receiving wastewater from approximately 90 drug manufacturers. The treated effluent contains pharmaceuticals at up to ten times human therapeutic plasma levels, including several broad spectrum fluoroquinolone antibiotics. The bacterium was found to be resistant to 36 of 39 tested antibiotics belonging to several different clinically relevant classes of antibiotics. Massively parallel pyrosequencing (454 sequencing) of its genome resulted in an average sixteen-fold coverage. Comparative genomics were used to analyze the genome of the resistant isolate in relation to the already sequenced reference strain O. intermedium LMG 3301T. The analysis revealed structural differences between the strains, including insertions and large deletions. Several non-synonymous point mutations were detected in protein coding sequences, as well as alterations in the ribosomal RNA genes. The quinolone target enzymes, DNA gyrase and topoisomerase IV, showed 9 amino acid changes in the isolate, three of which are known to cause quinolone resistance in Escherichia coli. There was also a considerable amount of sequence reads that did not map onto the reference genome, suggesting that the isolate has acquired large regions (total of 1 Mb) of novel genetic material, e.g. plasmids. With these reads, we assembled a number of regions associated with multiresistance containing arrays of resistance genes. The results presented here demonstrate the power of second generation sequencing technology as well as the need for sustainable management of antibiotic waste.
  7. Daniel Larsson
    Cell and Molecular Biology
    Uppsala University
    Initial Stages of Viral Capsid Dissolution Studied by Molecular Simulations
    The low calcium concentration of plant cells is exploited by viruses which use it as the trigger to open up and release the genetic material following viral entry. The nucleic acid can subsequently recruit the replication machinery of the host in order to multiply. In microsecond trajectories of the protein capsid of the Satellite Tobacco Necrosis Virus without the 92 structural calcium ions we observed a significantly increased radial expansion compared to simulations of the capsid with Ca2+. There was a substantial increase in the net flow of water into the capsid upon removal of the ions, passing pre- dominantly between the proteins at the 3-fold symmetry axis. The simulations provide insights into how a virus can dissolve its capsid structure in full atomic detail.
  8. Peter Swain
    University of Edinburgh
    Modelling and measuring stochastic gene expression
  9. Iskra Staneva
    Astronomy and Theoretical Physics
    Lund University
    It works in theory: the binding of peptides to PDZ domains
    PDZ domains are found in proteins that are involved in signaling processes. They enable interactions by binding to linear sequence motifs at the C-termini of other proteins. The PDZ domains can be divided into different groups, depending on what kind of motifs they recognize. A question that naturally follows is whether there are any differences in the peptide binding process itself. We address this by performing all-atom Monte Carlo simulations of representatives from the two major groups. The atoms are subject to an effective force field, mainly modeling hydrogen bonds together with hydrophobic and electrostatic interactions. This enables an extensive sampling of the PDZ domain-peptide conformational space and renders minimum-energy structures very similar to the experimentally determined complexes. Analysis of free-energy landscapes and the temperature dependence of various observables suggests that the binding dynamics, which overall can be described by a two-state model, indeed might be different for the two groups.
  10. Sebastian Rämisch
    Biochemistry & Structural Biology
    Lund University
    Computational Design of Self-assembling Leucine-rich Repeat Proteins
    Leucine-rich repeat containing proteins (LRRs) are ubiquitous protein binders and include many receptor proteins of the innate immune system. Individual repeating units of the human ribonuclease inhibitor were analysed for their ability to form closed ring structures of multiple symmetries by employing the ROSETTA symmetrical docking protocol. One repeating unit was chosen for design of a novel highly symmetrical LRR-protein by iterative cycles of symmetrical docking with C10-symmetry and symmetrical design. Two designed protein, comprising half a ring, were successfully expressed, purified, and analysed for self-assembling, surprisingly revealing stable monomeric proteins. Smaller units will be tested, to reveal the smallest possible assembling module, and hence gain insight into the possible evolutionary origin of repeat proteins in general.
  11. Yasser Gaber
    Lund University
    Molecular Modeling as a Rational Design Tool of an Esterase
    Rational design of enzymes is an approach to modify enzyme properties based on mechanistic and structural knowledge of the enzyme. It has been greatly enhanced by the availability of molecular simulation software. We have used YASARA molecular modelling software to design mutants of an esterase (PLE) in order to enhance its enzyme activity and enantioselectivity toward a racemic compound (clopidogrel). Molecular modelling of the R and S enantiomers inside the active site of PLE model was performed to understand the possible geometric hindrances for formation of the enzyme-substrate tetrahedral intermediate. Glu203 in the vicinity of the active site was found to form hydrogen bonds with the catalytic His449 and GGG(X) motif residues. The hydrogen bonding of these residues should be available for the stabilization of the tetrahedral intermediate (Fig.1). Hence, Glu203 would be a potential site for a site directed mutation in the PLE gene expressed in Escherichia coli.
  12. Olena Ishchuk Biology
    Lund University
    Parallel evoultion in yeast
    Saccharomyces yeasts degrade sugars to two-carbon components, in particular ethanol, even in the presence of excess oxygen. This characteristic is called the Crabtree effect and is the background for the 'make-accumulate-consume' life strategy, which in natural habitats helps Saccharomyces yeasts to out-compete other microorganisms. A global promoter rewiring in the Saccharomyces cerevisiae lineage, which occurred around 100 mya, was one of the main molecular events providing the background for evolution of this strategy. Here we show that the Dekkera bruxellensis lineage, which separated from the Saccharomyces yeasts more than 200 mya, also efficiently makes, accumulates and consumes ethanol and acetic acid. Analysis of promoter sequences indicates that both lineages independently underwent a massive loss of a specific cis-regulatory element from dozens of genes associated with respiration, and we show that also in D. bruxellensis this promoter rewiring contributes to the observed Crabtree effect.
    Rozpedowska et al. (2011) Nat Commun.2:302.
  13. Henk-Jan Joosten
    3DM protein engineering superfamily databases
    A powerful method to gain biological insights in the functioning of a protein is to use data available for the protein (super)-family. The large amounts of different data types can be used to detect correlations that reveal the function of amino acids, but collecting and analyzing the data is difficult and time consuming. Therefore, protein super-family specific databases are needed.
    3DM was developed to automatically build such protein super-family databases. 3DM systems are based on a automatically generated structure based super-family alignment and many different data types, such as mutational information (extracted from literature, mutation databases, Swiss-Prot, OMIM, PDB files, etc), ligand- and substrate contacts, protein-protein interactions, SNP data, and data derived form the alignment (e.g. correlated mutations, amino acid conservation, and amino acid distribution) are all connected to each other, to the alignment, and to the protein structures. This connectivity enables detection of hidden correlations, transfer of different data types between family members, and easy visualization of data in the alignment and in all super-family structures.
    3DM was used to change/improve many different enzyme characteristics such as activity, specificity, enantio-selectivity, thermostability, etc. It was shown that 3DM can be used to predict deleterious mutations and this knowledge was used to design “smart libraries” that contain a small number of mutations with a high number of successful constructs. 3DM was used to predict specific double mutants that increased enzyme activity of which both single mutations decreased the activity of the enzyme. 3DM was used to design a strong inhibitor for an enzyme and 3DM was used to understand the difference between inhibiting and activating compounds.