Improvement of the assembly of heterozygous genomes of non-model organisms - Archive ouverte HAL Access content directly
Conference Poster Year : 2015

Improvement of the assembly of heterozygous genomes of non-model organisms


Whereas the number of non-model organisms being sequenced has drastically increased, the extraction of biological information from such data is hampered by the low quality of the draft assemblies. In particular, the combination of a high level of heterozygosity and short reads sequencing leads to fragmented assembly and the overestimation of the gene content and of the genome size. Recently, new assemblers have been developed to better handle heterozygous data. But, the complete re-assembly of a genome involves automatic and manual re-annotations tasks that are very cost-effective. Thus, we present here a novel method to detect and correct false duplications due to heterozygosity (two alleles instead of one consensus sequence) in diploid draft assemblies. In addition, the method is able to relocate and merge supernumerary gene annotations. The method is based on a whole genome self-alignment (Lastz + AxtChain) allowing the detection of highly similar regions. These can have two origins: either allelic regions or duplicated regions. To distinguish between them, three criteria are used: 1/ their location inside scaffolds: contrary to duplications, unmerged haplotypes come from the same locus and must share the same genomic contexts, 2/ their cumulative read depth (close to the expected one) and 3/ their level of redundancy in the whole assembly. Next, Detected pairs of allelic regions needs to be merged into one unique sequence in the assembly: either by the complete deletion of the redundant scaffolds or by the construction of meta-scaffolds (scaffolds joined together) keeping only the allele present in the longest scaffold of the pair. Genes located on the merged alleles need to be correctly re-annotated. This is performed using Exonerate and Augustus. The former allows to identify the location of the deleted genes onto the remaining allele. The latter is used to predict new genes or consensus ones. We applied this method to an heterozygous wild type insect genome assembly. This leads to a drastic reduction of the genome assembly size (coherent with the expected size estimated by flow cytometry) and to the increase of the N50. Most of the new meta-scaffolds were confirmed by several additional resources : mate pairs, BAC ends sequence mapping and synteny analysis. Moreover, about 80% of gene predictions located in removed fragments have been either relocated or merged with their complementary allele.
Fichier principal
Vignette du fichier
genome_informatics_gouin.pdf (636.42 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-01231793 , version 1 (20-11-2015)


  • HAL Id : hal-01231793 , version 1
  • PRODINRA : 463710


Anaïs Gouin, Anthony Bretaudeau, Emmanuelle d'Alençon, Claire Lemaitre, Fabrice Legeai. Improvement of the assembly of heterozygous genomes of non-model organisms. Genome Informatics, Oct 2015, Cold Spring Harbor Laboratory, United States. 2015. ⟨hal-01231793⟩
518 View
111 Download


Gmail Facebook Twitter LinkedIn More