Of the fungal wheat pathogen Zymoseptoria tritici using ISO-SEQ and RNA-SEQ DATA
Résumé
The genome of the fungal wheat pathogen Zymoseptoria tritici IPO323 strain was sequenced and annotated in 2011. Since, additional IPO323 genome annotations were released using different ab initio software and RNA-Seq evidences. These annotations displayed many discrepancies, and only a few CDS have identical structures (n: 3918, 30%). Iso-Seq long-read sequencing delivers full-length transcripts, facilitating gene model prediction. Iso-Seq transcriptomic data, corresponding to 11 biological conditions, were obtained for IPO323. This dataset was used with other evidence (RNA-Seq data and fungal protein sequences from public databases) to generate new ab initio annotations of IPO323 genome sequence. They were compared to previous annotations to select the best gene models according to transcriptomic and protein evidence using ingeannot, a suite of bioinformatics tools for the annotation, selection, and validation of gene models, and their comparison. The new annotation corrected many errors (2047) found in previous annotations (CDS fusions, false introns, and missing exons), and added 671 new genes, leading to 13,414 Re-annotated Gene
Models (RGMs). Iso-Seq and RNA-Seq data were used to define 5’ and 3’UTRs for 73% of the genes. 13% of RGMs displayed alternative transcripts, mostly corresponding to intron retention (75%). However, 353 genes displayed alternative transcripts with new combination of existing exons or new exons. Long non-coding transcripts (51 lncRNAs)
were also identified, as well as DsRNA from two fungal viruses. Most lncRNAs corresponded to antisense transcripts of genes (52%). lncRNAs up- or down-regulated during infection (17) were enriched in antisens transcripts (70%) suggesting their involvement in the control of gene expression during infection. Overall, Iso-seq data were very
effective for the improvement of Z. tritici genome annotation. It also provides new insights in its transcriptional landscape