| Home | Projects | Images | Miscellaneous | Contact | Sitemap

Enhancements for the V2.2, V2.4 and V2.6 series

MIRA2 has undergone a serious redesign since the 1.x series, most of the code was rewritten from scratch. A whole battery of enhancements was introduced, both on the code and on the algorithmical level. The following list names just a few:


easy to use-and-combine parameter switches for predefined tasks: quick switches (also called dwim: Do-What-I-Mean switches). One now needs to set just some three switches (from a set of a dozen or so) to get to nice assemblies adapted to different conditions.
overall speedups in many parts thanks to new algorithms (read/read comparison, SW alignment, pathfinder module, contig build module, read extension)
overall quality improvements: longer contigs with less errors remaining, reliable detection and resolving of misassemblies when using clone pair (also called templates or "double-barreled data") techniques, enhanced 'probably true' consensus computation without gaps and with consensus quality files, improved automatic editor when using ABI 373, 377, 3100 and 3700 trace files (MEGABACE should also be ok)
assembly for whole genomes supported for up to 10 megabases (and more for really fast and big computers)
EST assembly support: detection of SNPs; transcript assembly by strains, according to detected SNP bases, special routines for extreme coverage that allow assembly of gene families with thousands of similar sequences
additional and/or improved input and output formats; fasta with quality, gap4 directed assembly, phrap/consed ACE format (output only), GBF and others
assembly options: a plethora of options (120+) to fine tune the assembly, these can now also be loaded from parameter files
data preprocessing routines if these were not or uncorrectly provided by external data preprocessing programs: clipping potential vector leftovers in sequences, support for 'screened' bases in FASTA files, own quality clipping routines, tagging of poly-A or poly-T bases at the end of EST sequences
enhanced support for clone-end pair sequences respectively clone template sizes (mate-pairs)
full IUPAC support in input and output files (as well as internal computation)
support for merging ancillary data from XML trace info files (in NCBI format)
many assembly info files generated, containing machine and human readable statistics, cluster and assembly information
optimised alignments (no more gap base jiggling)
possibility to load "backbones" sequences: use these sequences as seeds or perform a mapping assembly against them
possibility to assembly several closely related strains in one go
support for loading GenBank (gbf/gbff/gbk) files while retaining all features



© 1997-2013 by Bastien Chevreux