Annotation of the oak genome sequence and associated bioinformatic resources - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Conference Papers Year : 2016

Annotation of the oak genome sequence and associated bioinformatic resources

Thibault Leroy
Antoine Kremer
Hadi Quesneville


The large, complex and highly heterozygous genome of pedunculate oak (Quercus robur) was sequenced using a whole-genome shotgun approach [1] . Roche 454 GS-FLX sequence reads were assembled into contigs and combined with Illumina reads from paired-end, mate-pair libraries and true synthetic long reads to build a total of 8,827 scaffolds (1.46 Gb total size; N50=821 kb). Both haplotypes were merged into an haploid version and 12 pseudomolecules were established using a high-density linkage map [2] combined with a syntenome approach using the peach genome sequence.The structural (Transposable Elements (TEs), genes, ncRNA) and functional annotation of automatically predicted genes relies on powerful and robust pipelines: (i) REPET package [3] [4] was first used to de novo detect, classify and annotate TEs representing about 50% of the genome; (ii) Eugene was trained and launched to integrate ab initio and similarity gene finding software to finally predict 43,240 genes including 29,665 highly confident gene models; (iii) ncRNA were predicted using feelcn (lncRNA), similarities against databases and small RNAseq data analysis (miRNA), RNAmmer (rRNA), tRNAscan-SE (tRNA) and Infernal package (other non-coding RNA) (iv) A functional annotation pipeline mainly based on Interproscan to search for patterns/motifs and Blast based comparative genomics was launched onto the 43,240 predicted proteins. The assignation of a provisional definition for predicted protein according to the results of the most reliable tools and their occurrence in Oak annotation was produced (D. Goodstein method, personal communication). We will present here these pipelines and the results of this annotation.We also set up an integrated genome annotation system (dedicated to oak) based on GMOD web interfaces such as WebApollo/JBrowse and Intermine to make these data available under a user-friendly environment. This system allowed experts to analyze their respective protein families of interest and curate/validate gene structure. We will also present the interoperability between these genomic data and genetic data produced in Quercus (SNPs, linkage maps, QTLs) available in GnpIS [5] an information System for plants. All together these resources provide a framework to study the two key evolutionary processes that explain the remarkable diversity found within the Quercus genus: local adaptation and speciation.
No file

Dates and versions

hal-02741128 , version 1 (03-06-2020)


  • HAL Id : hal-02741128 , version 1
  • PRODINRA : 365223


Joelle J. Amselem, Jean Marc Aury, Nicolas Francillonne, Tina Alaeitabar, Corinne da Silva, et al.. Annotation of the oak genome sequence and associated bioinformatic resources. IUFRO Genomics and Forest Tree Genetics, Institut National de Recherche Agronomique (INRA). UMR Biodiversité, Gènes et Communautés (1202)., May 2016, Arcachon, France. 134 p. ⟨hal-02741128⟩
179 View
0 Download


Gmail Mastodon Facebook X LinkedIn More