Annotation of the oak genome sequence and associated bioinformatic resources

The large, complex and highly heterozygous genome of pedunculate oak (Quercus robur) was sequenced using a whole-genome shotgun approach [1] . Roche 454 GS-FLX sequence reads were assembled into contigs and combined with Illumina reads from paired-end, mate-pair libraries and true synthetic long reads to build a total of 8,827 scaffolds (1.46 Gb total size; N50=821 kb). Both haplotypes were merged into an haploid version and 12 pseudomolecules were established using a high-density linkage map [2] combined with a syntenome approach using the peach genome sequence.The structural (Transposable Elements (TEs), genes, ncRNA) and functional annotation of automatically predicted genes relies on powerful and robust pipelines: (i) REPET package [3] [4] was first used to de novo detect, classify and annotate TEs representing about 50% of the genome; (ii) Eugene was trained and launched to integrate ab initio and similarity gene finding software to finally predict 43,240 genes including 29,665 highly confident gene models; (iii) ncRNA were predicted using feelcn (lncRNA), similarities against databases and small RNAseq data analysis (miRNA), RNAmmer (rRNA), tRNAscan-SE (tRNA) and Infernal package (other non-coding RNA) (iv) A functional annotation pipeline mainly based on Interproscan to search for patterns/motifs and Blast based comparative genomics was launched onto the 43,240 predicted proteins. The assignation of a provisional definition for predicted protein according to the results of the most reliable tools and their occurrence in Oak annotation was produced (D. Goodstein method, personal communication). We will present here these pipelines and the results of this annotation.We also set up an integrated genome annotation system (dedicated to oak) based on GMOD web interfaces such as WebApollo/JBrowse and Intermine to make these data available under a user-friendly environment. This system allowed experts to analyze their respective protein families of interest and curate/validate gene structure. We will also present the interoperability between these genomic data and genetic data produced in Quercus (SNPs, linkage maps, QTLs) available in GnpIS [5] an information System for plants. All together these resources provide a framework to study the two key evolutionary processes that explain the remarkable diversity found within the Quercus genus: local adaptation and speciation.

Domaines

Sciences du Vivant [q-bio] Biologie végétale Mathématiques [math] Informatique [cs] Sciences de l'environnement

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02741128

Soumis le : mercredi 3 juin 2020-00:39:05

Dernière modification le : mercredi 3 avril 2024-10:20:13

Dates et versions

hal-02741128 , version 1 (03-06-2020)

Identifiants

HAL Id : hal-02741128 , version 1
PRODINRA : 365223

Citer

Joelle J. Amselem, Jean Marc Aury, Nicolas Francillonne, Tina Alaeitabar, Corinne da Silva, et al.. Annotation of the oak genome sequence and associated bioinformatic resources. IUFRO Genomics and Forest Tree Genetics, Institut National de Recherche Agronomique (INRA). UMR Biodiversité, Gènes et Communautés (1202)., May 2016, Arcachon, France. 134 p. ⟨hal-02741128⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CEA PRES_CLERMONT INRA GDEC UNIV-LORRAINE UNIV-PARIS-SACLAY IAM-UL INRAE BIOFORA A2F-UL INRAEOCCITANIETOULOUSE BIOGECO URGI INRAEVALDELOIRE MATHNUM MIAT BIOLOGIE_ET_AMELIORATION_DES_PLANTES

179 Consultations

0 Téléchargements