Modern technologies and algorithms for scaffolding assembled genomes

by Jay Ghurye, Mihai Pop

The computational reconstruction of genome sequences from shotgun sequencing data has been greatly simplified by the advent of sequencing technologies that generate long reads. In the case of relatively small genomes (e.g., bacterial or viral), complete genome sequences can frequently be reconstructed computationally without the need for further experiments. However, large and complex genomes, such as those of most animals and plants, continue to pose significant challenges. In such genomes, assembly software produces incomplete and fragmented reconstructions that require additional experimentally derived information and manual intervention in order to reconstruct individual chromosome arms. Recent technologies originally designed to capture chromatin structure have been shown to effectively complement sequencing data, leading to much more contiguous reconstructions of genomes than previously possible. Here, we survey these technologies and the algorithms used to assemble and analyze large eukaryotic genomes, placed within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era.

Tratto da:
Note sul Copyright: Articles and accompanying materials published by PLOS on the PLOS Sites, unless otherwise indicated, are licensed by the respective authors of such articles for use and distribution by you subject to citation of the original source in accordance with the Creative Commons Attribution (CC BY) license.