Cross-species use of neural networks to improve pig genome annotation -a proof of concept
Résumé
A better knowledge of functional characterization of livestock species seems a lever linking genome to phenome. However, data describing gene regulation mechanisms and chromatin state in various experimental conditions are lacking. To overcome this bottleneck, predictive biology seems a good alternative. Human and mouse are organisms phylogenetically close to pig, we can assume that molecular mechanisms are similar. Furthermore, they offer much more data which is a condition to train powerful deep learning algorithms.
Here, we use neural networks trained with human and murine data to predict gene regulation mechanisms from pig DNA sequences. We focused our analysis on a genomic region known to be associated with production traits in pigs. Because of the abundance of CTCF binding sites on genome, we used this protein as an indicator to estimate the accuracy of the predictions. For different tissues, at least half of observed peaks were predicted. Four reference chromatin marks also show correlations between observations and predictions from 0.5 to 0.8.
To conclude, the prediction results dedicated on a specific genomic region seem promising. An extended whole pig genome analysis will be performed and those predictions will enrich a database accessible to scientific community. A fine-tuned optimisation with data augmentation by orthology may improve predictions. Furthermore, this approach may also help us to predict variant impact and associate it with phenotypes of interest.
Ce travail a bénéficié d’une aide de l’état gérée par l’Agence Nationale de la Recherche au titre de France 2030 portant la référence ANR-22-PEAE-0015