Date of Award
January 2014
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Ronald Marsh
Abstract
The advent of next-generation sequencing (NGS) technology has shown unprecedented promise for accurately identifying and quantifying genomic variants for living organisms. For species whose genome sequences are unknown, the first step of RNA sequencing data analysis is to assemble all short reads. The de Bruijn graph-based algorithms, such as Oases, are usually used for short reads assembly to resolve the issue of computational complexity. However, de Bruijn graph-based assemblers normally generate high error rates when assembling RNA-Seq data. We have developed a novel assembly algorithm that can be used jointly with any other assembly methods for RNA-Seq short reads. The proposed method, clustering-based assembly (CBA), aims not only to maintain computational and memory efficiency but also improve the assembly accuracy in our simulation study. We tested CBA using ERCC RNA-Seq data, simulated data from Chromosome 22, and real human RNA-Seq data. The results showed that our algorithm was more accurate in comparison with other de novo methods in terms of short reads mapping rate, recover rate, and contigs mapping rate.
Recommended Citation
Yang, Yi, "A Novel Assembly Algorithm That Optimizes For RNA-Seq Data" (2014). Theses and Dissertations. 1610.
https://commons.und.edu/theses/1610