Assembly strategies

``If it were easy, it would have been done already.'' (Solomon Short)

Referring to Dear et al. (1998), a ``sequence assembly is essentially a set of contigs, each contig being a multiple alignment of reads''.⁹ Most unfortunately, the underlying problem of string assembly as a variant of the shortest common superstring problem has been shown to be NP-hard by Armen and Stein (1995) so that heuristic algorithms represent the only possibility approach this problem.

According to Chen and Skiena (2000), many assemblers have similar fundamental designs but differ substantially on important engineering issues. This chapter shortly summarises the most important strategies existing as of this writing. It then outlines the global concept developed in this thesis to address weak points in current strategies, that is, to tackle the goal of producing better consensus alignments by correctly discerning repeats set for this thesis.

Subsections

Existing strategies
Developing a new strategy

Bastien Chevreux 2006-05-11