Search
Links and Functions
Language Selection
Breadcrumb Navigation
Main Navigation
Content
Algorithmic Bioinformatics
Efficient algorithms and space-efficient data structures play an
important role in Bioinformatics. Without new, clever, and ingenious
algorithmic ideas the recent challenges from modern Life Sciences could
not been handled. Thus, the recent progress in Bioinformatics and System
Biology relies heavily on these algorithmic concepts. Clearly, the
problems covered by Algorithmic Bioinformatics and which we have
addressed stem from various areas of Life Sciences.
Research Team
Succinct Data Structures in Bioinformatics
Range Minimum Queries and Suffix Arrays
Project Website
The Range-Minimum-Query-Problem (RMQ) is to preprocess an array in
linear time such that all subsequent queries asking for the position of
a minimal element between two specified indices can be obtained in
constant time. The first algorithm that never uses more than 2n+o(n)
bits has been developed, which does not rely on rank- and select-queries
or other succinct data structures. The importance of this result is
stressed by simplifying and reducing the space consumption of the
Enhanced Suffix Array, while retaining all its capabilities to simulate
suffix trees.
Compressed Range Minimum Queries
It is shown that for compressible input arrays the RMQ-information can
be compressed as well. In particular, the information for RMQs can be
stored within the same entropy bounds that are achieved by the currently
best schemes for storing the underlying array itself in compressed form,
while still being able to access a logarithmic number of contiguous bits
in constant time. Evaluations show that the practical space consumptions
of the non-compressed schemes scale surprisingly well with their
theoretical guarantees, and for compressible input arrays the new
compressed schemes can indeed reduce the space, with little or no
slowdown in query time.
2008
Papers
Johannes Fischer , Volker Heun , Horst Martin Stühler .
Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries .
J.A. Storer , M.W. Marcellin (eds.):
Proceedings of the 2008 Data Compression Conference (DCC'08), Snowbird, Utah, U.S.A., March 25-27, 2008, pp. 272-281, IEEE Computer Society, 2008.
Range Median of Minima Queries
A natural extension of RMQ is considered, where a further constraint is
that if the minimum in a query interval is not unique, then the query
should return an approximation of the median position among all
positions that attain this minimum. A succinct preprocessing scheme
using Dn+o(n) bits in addition to the static input array (for a small
constant D) has been developed, such that subsequent range median of
minima queries can be answered in constant time. This data structure can
be built in linear time, with little extra space needed at construction
time. Several new combinatorial concepts are introduced such as
Super-Cartesian Trees and Super-Ballot Numbers, which are believed to
have other interesting applications in the future.
2010
Papers
Johannes Fischer , Volker Heun .
Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing .
Mirka Miller , Koichi Wada (eds.):
Proceedings of the International Workshop on Combinatorial Algorithms (IWOCA'08), September 13-15, 2008, Nagoya, Japan, Texts in Algorithmics, vol 12, pp. 239-252, College Publications, January 2010.
String Mining in Bioinformatics
String Mining
Project Website
A new algorithmic framework has been developed for solving
frequency-related data mining queries on databases of strings in optimal
time, i.e., in time linear in the input and the output size. The
additional space is linear in the input size. This framework can be used
to mine frequent strings, emerging strings, and strings that pass other
statistical tests, e.g., the 2-test. The advantages of array-based data
structures (compared with dynamic data structures such as trees) are
good locality behavior and extensibility to secondary memory. The
algorithm is tested on real-world biological data and demonstrates that
the approach also works well in practice.
2006
Papers
Johannes Fischer , Volker Heun , Stefan Kramer .
Optimal String Mining Under Frequency Constraints .
J. Fürnkranz , T. Scheffer , M. Spiliopoulou (eds.):
Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'06), Berlin, Germany, September 18-22, 2006, Lecture Notes in Computer Science, vol 4213, pp. 139-150, Springer-Verlag, 2006.
Algorithms for Evolutionary Aspects
Hybroscale - Computation of hybridization networks
Project Website
Hybroscale is developed specifically for the research of hybridization networks including its computation and visualization.
Sorting by Prefix reversals
Sorting by Prefix Reversals, also known as Pancake Flipping, is the
problem of transforming a given permutation into the identity
permutation, where the only allowed operations are reversals of a prefix
of the permutation. The problem complexity is still unknown. The first
polynomial-time 2-approximation algorithm to solve this problem has been
developed. Empirical tests suggest that the average performance is in
fact better than 2.
2005
Papers
Johannes Fischer , Simon W. Ginzinger .
A 2-Approximation Algorithm for Sorting by Prefix Reversals .
Gerth Stølting Brodal , Stefano Leonardi (eds.):
Proceedings of the 13th Annual European Symposium on Algorithms (ESA'05), Mallorca, Spain, October 3-6, 2005, Lecture Notes in Computer Science, vol 3669, pp. 415-425,, Springer-Verlag, 2005.
Reconstructing Ultrametric Trees
The reconstruction of an ultrametric tree from a distance matrix is a
very frequent subproblem in clustering or reconstructing evolutionary
trees, both are common problems in Bioinformatics. In his famous book,
Gusfield presented a very simple, but not time-optimal recursive
algorithm for this problem (see also errata of Gusfield's book). It has
been shown that a simple modification of Gusfield's algorithm allows a
time-optimal solution, which is the first simple optimal algorithm for
this problem.
Protein Folding
Combinatorial Protein Folding
The extended cubic lattice is a natural extension of the cubic lattice
for the HP model proposed by Dill et al that bypasses its major
drawback, its bipartiteness, by adding plane diagonals. In this model,
general folding algorithms which achieve an approximation ratio of 59/70
for all protein sequences and an approximation ratio of 37/42 for a
restricted but quite natural subset of HP-sequences have been proposed.
Experimental Protein Structure Determination
Analysis of Chemical Shift Data for NMR Structure Determination
Project Website
The goal of this project is to achieve a considerable speed-up in the NMR structure solving process. Therefore computational methods are developed which allow the creation of models for a protein structure at an experimental stage in which this is not possible by hand. These models may then be refined and tested for correctness using the "standard" way. The time saving results from the need for fewer experiments in when building the initial model.
2007
Papers
Simon W. Ginzinger , Thomas Gräupl , Volker Heun .
SimShiftDB: Chemical-Shift-Based Homology Modeling .
Sepp Hochreiter , Roland Wagner (eds.):
Proceedings of the First International Conference on Bioinformatics Research and Development (BIRD'07), Berlin, Germany, March 12-14, Lecture Notes in Bioinformatics, vol 4414, pp. 357-370, Springer, 2007.
2006
Automatic Correction of Inconsistent Chemical Shift Referencing
Project Website
The construction of a consistent protein chemical shift database is an important step toward making more extensive use of this data in structural studies. Nowadays available data is frequently impurified, in particular with respect to chemical shift referencing, which is often either inaccurate or inconsistently annotated. CheckShift is at tool for preprocessing chemical shift data to detect and correct referencing errors.
Service Menu
Footer