| Home | Projects | Images | Miscellaneous | Contact | Sitemap

Using MIRA with 454 and Sanger data

MIRA can assemble 454 reads together with Sanger reads in a truly hybrid assembly, i.e., unlike other hybrid assemblers, it does not assemble the 454 consensus sequence data with Sanger reads but it takes the real stuff. MIRA can also assemble reads to an existing backbone. The following examples shows errors in the official Streptococcus pneumoniae TIGR4 genome at GenBank that were discovered using a true hybrid 454 / Sanger sequence assembly against the GenBank genome as backbone. The images are collages of screenshots of the MIRA assembly results transferred as project into the gap4 program of the Staden package (Note: arrows and ellipses added with Gimp).

 

The first example is a low coverage Sanger sequence (only one trace). The basecall of this trace contains a wrongly called additional base (an overcall). One can can see this in the trace (double-click in the gap4 project on the trace at that position) together with the fact that this base got an amzing quality score of 40. This wrongly called base made it's way into the official S.pneumoniae TIGR4 genome at GenBank, this would not have happened if 454 data had been available at that time because none of the 454 reads supports that base.




Low Sanger coverage. The only available Sanger type read has an overcall which is also reflected in the official genome (top line).


The second example shows a medium to high coverage Sanger case (5 traces), but the reads are in the same direction and all but one contain an undercall (a base was missed). The 454 reads show the true data. This error too made it's way into the official S.pneumoniae TIGR4 genome at GenBank.

 




High Sanger coverage, but all reads in same direction. Additionally, 4 of the 5 reads have missed a base in base-calling, the fifth read has the sequence too near to sequencing vector cut-offs.




© 1997-2013 by Bastien Chevreux