Communication Dans Un Congrès Année : 2024

Advancing multi-environment genomic prediction with explainable deep learning in apple

Carles Quesada-Traver
  • Fonction : Auteur
Michaela Jung
  • Fonction : Auteur
  • PersonId : 1120437
Steven Yates
  • Fonction : Auteur
Maria José Aranzana
  • Fonction : Auteur
Walter Guerra
  • Fonction : Auteur
Marijn Rymenants
  • Fonction : Auteur
Andrea Patocchi
  • Fonction : Auteur
  • PersonId : 971182
Bruno Studer
  • Fonction : Auteur

Résumé

Multi-environment genomic prediction is a useful tool for plant breeding which can help to estimate breeding values of genotypes across diverse environments. For an accurate prediction, methods must integrate phenotypic, genotypic, and environmental data effectively. Yet, the diverse structure of this data poses a challenge for its analysis. However, this complexity is well-suited for deep learning methods because of their modularity. Here, we present an explainable multimodal deep learning method to perform genomic prediction on a multi-year and multi-environment apple REFPOP dataset of eleven quantitative traits. To implement the modelling approach, genotypic data was subjected to feature selection to reduce its dimensionality and improve training performance. Conversely, environmental data was processed as daily mean values. To effectively use environmental time-series data, our model employed long-short term memory (LSTM) layers, alongside dense layers for other data inputs. Different data types were processed through separate multi-layer streams within the architecture and concatenated just before the final regression output layer. The proposed methodology outperformed its statistical counterparts for three out of the eleven traits present in the dataset when performing a five-fold cross-validation repeated five times. These traits were harvest date, titratable acidity and red over colour, with an increase in predictive ability measured with the Pearson’s correlation coefficient r of 0.05, 0.08 and 0.09, respectively. The remaining eight traits showed similar performance as the compared statistical models. Furthermore, we also incorporate an approach to explain the model predictions based on Shapley additive explanations, commonly referred to as SHAP values. Using this approach, we have been able to pinpoint the most important genetic variants as well as relevant time frames during which environmental variables influence trait predictions. Given the increasing amount of data generated in every field, our results provide a framework to integrate differentially structured data and produce accurate and interpretable predictions, using deep learning-based multi-environment genomic prediction models.

Fichier non déposé

Dates et versions

hal-04717222 , version 1 (01-10-2024)

Identifiants

  • HAL Id : hal-04717222 , version 1

Citer

Carles Quesada-Traver, Michaela Jung, Steven Yates, Maria José Aranzana, Walter Guerra, et al.. Advancing multi-environment genomic prediction with explainable deep learning in apple. 22nd EUCARPIA General Congres, Aug 2024, Leipzig, Germany. ⟨hal-04717222⟩
97 Consultations
0 Téléchargements

Partager

  • More