A lncRNA gene-enriched atlas for GRCg7b chicken genome using Ensembl, RefSeq and two FAANG databases - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Access content directly
Conference Poster Year : 2023

A lncRNA gene-enriched atlas for GRCg7b chicken genome using Ensembl, RefSeq and two FAANG databases

Abstract

With the release of new genome sequences, gene atlases in livestock species are steadily improving. Furthermore, genome annotations greatly vary across databases depending on data resources and bioinformatics pipelines used. These differences are particularly important for long non coding RNAs (lncRNA) compared to protein coding genes (PCG) due to their higher tissue- and stage- specificity. As previously done in 2020 for the galgal5 and GRCg6a chicken assemblies, we provide a new lncRNA-enriched atlas by considering the latest GRCg7b genome assembly. This new chicken gene atlas gathers i) both EMBL-EBI Ensembl/GENCODE databases which integrate GENE-SWitCH data and NCBI-RefSeq, considered as references ii) 3 databases from independent projects in particular from the University of California Davis and Fr-AgENCODE based on FAANG multi-tissues resources and iii) NONCODE, a lncRNA dedicated database. We characterized the overlap rate of gene models (max of 90% for PCGs and 39% for lncRNAs) between databases two by two and calculated concordance between gene TSS and CAGE peaks from FANTOM (max of 60% for PCGs and 40% for lncRNAs) for each database. Based on these characteristics and prioritizing the reference databases, we determined the order of gathering to maximize the quality of the final annotation. In total, the Ensembl and Refseq catalogues respectively grew from 17,007 and 18,010 to 24,102 PCGs and from 11,946 and 5,789 to 44,428 lncRNAs for a total of 78,323 genes. In addition, we provide a ‘functional’ gene annotation with 1,400 transcriptomes across 47 tissues and found 35,257 (79.4%) lncRNAs and 22,468 (93.2%) PCGs with an expression of TPM ≥ 0.1. Each gene of this atlas is provided with “genomic” and “functional” information and the corresponding RefSeq and Ensembl gene identifiers (http://www.fragencode.org). Project funded by the European Union’s Horizon 2020 research and innovation program under grant agreement N°101000236 and by ANR CE20 under EFFICACE program.
No file

Dates and versions

hal-04217567 , version 1 (25-09-2023)

Identifiers

  • HAL Id : hal-04217567 , version 1

Cite

Fabien Degalez, Mathieu Charles, Sylvain Foissac, Zhou Haijuan, Guan Dailu, et al.. A lncRNA gene-enriched atlas for GRCg7b chicken genome using Ensembl, RefSeq and two FAANG databases. 74. Annual meeting of the european federation of animal science (EAAP), Aug 2023, Lyon, France. Wageningen Academic Publishers, Book of abstracts, 29, pp.932, 2023, Book of abstracts of the 74st annual meeting of the european federation of animal science. ⟨hal-04217567⟩
72 View
0 Download

Share

Gmail Facebook X LinkedIn More