- ...
representations1
- also called fragments (see Myers (1995)) or
readings (reads)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... accurately2
- e.g. for new electrophoresis methods
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...mira3
- which is an acronym for Mimicking
Intelligent Read Assembly
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... work.4
- The automatic editor is subject of a
thesis to be presented by Thomas Pfisterer
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... character.5
- e.g., the
symbol 'W' for an uncertainty between A (Adenosin) and T (Thymin)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
orientation6
- the experimentally gained sequences have a 50% chance
of being in reverse complement orientation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... traces7
- Mean length of useful sequences gathered on ABI
3730 machines at The Institute of Genomic Research (TIGR) in 2003, pers.
communication from Bill Niermann (Investigator at TIGR) in April 2004
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... bases.8
- insertions
and deletions are commonly referred to as indels
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
reads''.9
- The term contig is derived from
``contiguous sequence''
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... Green10
- PHRAP as acronym for PHils Revised Assembly Program,
see also http://www.phrap.org/
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...fuguization11
- this refers to
the rather compact genome of the puffer fish (Fugu rubripes) which
is largely devoid of large copy repeats.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
enough.12
- e.g. for new electrophoresis methods
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
operations13
- Specialised hardware for this type of operations starts
approximately at EUR 500,000
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... information14
- which they call ``double-barreled data''
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... mind'15
- e.g. 'could the
base G at position 235 in read 4 be replaced by a A?' (because the overall
consensus at this position of the other reads suggests this possibility)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... task.16
- For
example, quality clipping, sequencing vector and cosmid vector removal can
be controlled by the PREGAP4 environment provided with the GAP4
package (Bonfield et al. (1995b); Staden (1996); Bonfield and Staden (1996)) or the LUCY
program, parts of these tasks can also be done with cross_match
provided by the PHRAP package or other packages like, e.g., PFP from Paracel
(Paracel (2002a)).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
biosciences17
- see also Myers (1991)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... sequence18
- It is
again of no consequence which sequence is in reverse complement direction to
the other, as both will be searched with the reverse complement pattern of
the other one.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... ZEBRA19
- ZEBRA is not an acronym, but
the algorithm was named because it produces 'bands' in memory which resemble
the patterns of the african Zebra
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... number20
- which is
4k for k Ns.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... sequences.21
- see
also the paper from Pearson (1998) for a review on empirical statistical
estimates for sequence similarity searches
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... exemplarily.22
- The graphical display of the
resulting bit vectors reminds vaguely to the african zebra, hence the
algorithm's name.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
devised23
- for example see Grice et al. (1997); Chao et al. (1994) for an overview and Chao et al. (1995) for an
application
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
mismatches.24
- but perhaps one or several aligns of a base against an
'N'
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... reads25
- Chimeric
reads, as described in section 2.2.2, must be considered as
garbage.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... contigs.26
- Of course, a single read itself cannot be
called a contig. But putting it into the same data structure (a contig
object) like the other, assembled reads is a convenient way to keep
unassembled reads in a database.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... projects27
- cosmid, BAC or even whole genome size
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
project.28
- for example to close gaps
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... occurring.29
- e.g. the infamous AG-problem known
with the ABI 373 and 377 machines where a G preceeded by an A is often
unincisive
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... positions.30
- although chemistry together with the
sequencing direction of a read might play a minor role on the type of errors
generated, but this has no real impact on the error distribution itself
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... threshold31
- working with
relatively high score ratios beginning with 80%
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... routines.32
- This is one of the more
prominent places where it shows that the EST assembler is a sibling of its
genome pendant.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...mira33
- MIRA: Mimicking Intelligent Read
Assembly
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
sequences34
- see Notredame (2002)
for a review of state-of-the-art algorithms,Thompson et al. (1999a) and
Lassmann and Sonnhammer (2002) for a evaluation of some of these tools and
Morgenstern et al. (2003) for the description of a web-based solution
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
GAP4/cycle35
- GAP4/cycle is a script performing several GAP4 assemblies
with decreasing strictness
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... test.36
- PHRAP
uses base qualities for the assembly, MIRA/EdIt can use them if present and
GAP4/cycle does not
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
2.0.137
- TraceTuner is from from Paracel Inc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... capabilities38
- It is widely
known that the speed of a program depends mainly on the quality of the
algorithms it bases on. However, good compilers can squeeze a considerable
amount of execution speed from optimal algorithms by optimising them on
machine level, sometimes to a factor of 3 and more.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
EGCS39
- which later on was promoted as official GCC until the new GCC 3
compiler lineage appeared by mid of 2001
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... .o''40
- mind the blank between the asterisk and .o
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... hierarchy41
- especially into a single-rooted
hierarchy
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.