Algorithmic Bioinformatics

Efficient algorithms and space-efficient data structures play an important role in Bioinformatics. Without new, clever, and ingenious algorithmic ideas the recent challenges from modern Life Sciences could not been handled. Thus, the recent progress in Bioinformatics and System Biology relies heavily on these algorithmic concepts. Clearly, the problems covered by Algorithmic Bioinformatics and which we have addressed stem from various areas of Life Sciences.

Research Team

The Range-Minimum-Query-Problem (RMQ) is to preprocess an array in linear time such that all subsequent queries asking for the position of a minimal element between two specified indices can be obtained in constant time. The first algorithm that never uses more than 2n+o(n) bits has been developed, which does not rely on rank- and select-queries or other succinct data structures. The importance of this result is stressed by simplifying and reducing the space consumption of the Enhanced Suffix Array, while retaining all its capabilities to simulate suffix trees.

2011

Papers

Johannes Fischer, Volker Heun. Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays. SIAM Journal on Computing, vol 40, no. 2, pp. 465-492, 2011.

BibTex

Compressed Range Minimum Queries

It is shown that for compressible input arrays the RMQ-information can be compressed as well. In particular, the information for RMQs can be stored within the same entropy bounds that are achieved by the currently best schemes for storing the underlying array itself in compressed form, while still being able to access a logarithmic number of contiguous bits in constant time. Evaluations show that the practical space consumptions of the non-compressed schemes scale surprisingly well with their theoretical guarantees, and for compressible input arrays the new compressed schemes can indeed reduce the space, with little or no slowdown in query time.

2008

Papers

Johannes Fischer, Volker Heun, Horst Martin Stühler. Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries. J.A. Storer, M.W. Marcellin (eds.): Proceedings of the 2008 Data Compression Conference (DCC'08), Snowbird, Utah, U.S.A., March 25-27, 2008, pp. 272-281, IEEE Computer Society, 2008.

BibTex

Range Median of Minima Queries

A natural extension of RMQ is considered, where a further constraint is that if the minimum in a query interval is not unique, then the query should return an approximation of the median position among all positions that attain this minimum. A succinct preprocessing scheme using Dn+o(n) bits in addition to the static input array (for a small constant D) has been developed, such that subsequent range median of minima queries can be answered in constant time. This data structure can be built in linear time, with little extra space needed at construction time. Several new combinatorial concepts are introduced such as Super-Cartesian Trees and Super-Ballot Numbers, which are believed to have other interesting applications in the future.

2010

Papers

Johannes Fischer, Volker Heun. Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing. Mirka Miller, Koichi Wada (eds.): Proceedings of the International Workshop on Combinatorial Algorithms (IWOCA'08), September 13-15, 2008, Nagoya, Japan, Texts in Algorithmics, vol 12, pp. 239-252, College Publications, January 2010.

BibTex

2010

Johannes Fischer, Volker Heun. Finding Range Minima in the Middle: Approximations and Applications. Mathematics in Computer Science, vol 3, no. 1, pp. 17-30, March 2010.

BibTex

String Mining in Bioinformatics

String Mining

Project Website

A new algorithmic framework has been developed for solving frequency-related data mining queries on databases of strings in optimal time, i.e., in time linear in the input and the output size. The additional space is linear in the input size. This framework can be used to mine frequent strings, emerging strings, and strings that pass other statistical tests, e.g., the 2-test. The advantages of array-based data structures (compared with dynamic data structures such as trees) are good locality behavior and extensibility to secondary memory. The algorithm is tested on real-world biological data and demonstrates that the approach also works well in practice.

2006

Papers

Johannes Fischer, Volker Heun, Stefan Kramer. Optimal String Mining Under Frequency Constraints. J. Fürnkranz, T. Scheffer, M. Spiliopoulou (eds.): Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'06), Berlin, Germany, September 18-22, 2006, Lecture Notes in Computer Science, vol 4213, pp. 139-150, Springer-Verlag, 2006.

BibTex

Algorithms for Evolutionary Aspects

Hybroscale - Computation of hybridization networks

Project Website

Hybroscale is developed specifically for the research of hybridization networks including its computation and visualization.

2012

Papers

Celine Scornavacca, Simone Linz, Benjamin Albrecht. A First Step Toward Computing All Hybridization Networks For Two Rooted Binary Phylogenetic Trees. Journal of Computational Biology, vol 19, no. 11, pp. 1227-1242, November 2012.

BibTex

2011

Papers

Benjamin Albrecht, Celine Scornavacca, Alberto Cenci, Daniel H. Huson. Fast computation of minimum hybridization networks. Bioinformatics, vol 28, no. 2, pp. 191-197, 2011.

BibTex

Sorting by Prefix reversals

Sorting by Prefix Reversals, also known as Pancake Flipping, is the problem of transforming a given permutation into the identity permutation, where the only allowed operations are reversals of a prefix of the permutation. The problem complexity is still unknown. The first polynomial-time 2-approximation algorithm to solve this problem has been developed. Empirical tests suggest that the average performance is in fact better than 2.

2005

Papers

Johannes Fischer, Simon W. Ginzinger. A 2-Approximation Algorithm for Sorting by Prefix Reversals. Gerth Stølting Brodal, Stefano Leonardi (eds.): Proceedings of the 13th Annual European Symposium on Algorithms (ESA'05), Mallorca, Spain, October 3-6, 2005, Lecture Notes in Computer Science, vol 3669, pp. 415-425,, Springer-Verlag, 2005.

BibTex

2009

Theses

Jeremias Weihmann. Genome-Rearrangements: Sortieren mit erweiterten Transreversals. Diploma Thesis, LFE Bioinformatik / LMU München, August 2009.

BibTex

Reconstructing Ultrametric Trees

The reconstruction of an ultrametric tree from a distance matrix is a very frequent subproblem in clustering or reconstructing evolutionary trees, both are common problems in Bioinformatics. In his famous book, Gusfield presented a very simple, but not time-optimal recursive algorithm for this problem (see also errata of Gusfield's book). It has been shown that a simple modification of Gusfield's algorithm allows a time-optimal solution, which is the first simple optimal algorithm for this problem.

2008

Papers

Volker Heun. Analysis of a Modification of Gusfield's Recursive Algorithm for Reconstructing Ultrametric Trees. Information Processing Letters, vol 108, no. 4, pp. 222-225, 2008.

BibTex

Protein Folding

Combinatorial Protein Folding

The extended cubic lattice is a natural extension of the cubic lattice for the HP model proposed by Dill et al that bypasses its major drawback, its bipartiteness, by adding plane diagonals. In this model, general folding algorithms which achieve an approximation ratio of 59/70 for all protein sequences and an approximation ratio of 37/42 for a restricted but quite natural subset of HP-sequences have been proposed.

2003

Papers

Volker Heun. Approximate Protein Folding in the HP Side Chain Model on Extended Cubic Lattices. Discrete Applied Mathematics, vol 127, no. 1, pp. 163-177, 2003.

BibTex

Experimental Protein Structure Determination

Analysis of Chemical Shift Data for NMR Structure Determination

Project Website

The goal of this project is to achieve a considerable speed-up in the NMR structure solving process. Therefore computational methods are developed which allow the creation of models for a protein structure at an experimental stage in which this is not possible by hand. These models may then be refined and tested for correctness using the "standard" way. The time saving results from the need for fewer experiments in when building the initial model.

2007

Papers

Simon W. Ginzinger, Thomas Gräupl, Volker Heun. SimShiftDB: Chemical-Shift-Based Homology Modeling. Sepp Hochreiter, Roland Wagner (eds.): Proceedings of the First International Conference on Bioinformatics Research and Development (BIRD'07), Berlin, Germany, March 12-14, Lecture Notes in Bioinformatics, vol 4414, pp. 357-370, Springer, 2007.

BibTex

2006

Papers

Simon W. Ginzinger, Johannes Fischer. SimShift: Identifying structural similarities from NMR chemical shifts. Bioinformatics, vol 22, no. 4, pp. 460-465, 2006.

BibTex

Automatic Correction of Inconsistent Chemical Shift Referencing

Project Website

The construction of a consistent protein chemical shift database is an important step toward making more extensive use of this data in structural studies. Nowadays available data is frequently impurified, in particular with respect to chemical shift referencing, which is often either inaccurate or inconsistently annotated. CheckShift is at tool for preprocessing chemical shift data to detect and correct referencing errors.

2007

Papers

Simon W. Ginzinger, Fabian Gerick, Murray Coles, Volker Heun. CheckShift: Automatic Correction of Inconsistent Chemical Shift Referencing. Journal of Biomolecular NMR, vol 39, no. 3, pp. 223-227, September 2007.

BibTex

Search

Links and Functions

Language Selection

User Menu

Breadcrumb Navigation

Main Navigation

Content

Researchers

Alumni

Range Minimum Queries and Suffix Arrays

Compressed Range Minimum Queries

Range Median of Minima Queries

String Mining

Hybroscale - Computation of hybridization networks

Sorting by Prefix reversals

Reconstructing Ultrametric Trees

Combinatorial Protein Folding

Analysis of Chemical Shift Data for NMR Structure Determination

Automatic Correction of Inconsistent Chemical Shift Referencing

Service Menu

Footer