Massive spectral data analysis for plant breeding using parSketch-PLSDA method: Discrimination of sunflower genotypes
Résumé
In precision agriculture and plant breeding, the amount of data tends to increase. This massive data is becoming more and more complex, leading to difficulties in managing and analysing it. Optical instruments such as NIR Spectroscopy or hyperspectral imaging are gradually expanding directly in the field, increasing the amount of spectral database. Using these tools allows access to non-destructive and rapid measurements to classify new varieties according to breeding objectives. Processing this massive amount of spectral data is challenging. In a context of genotype discrimination, we propose to apply a method called parSketch-PLSDA to analyse such a massive amount of spectral data. ParSketch-PLSDA is a combination of an indexing strategy (parSketch) and the reference method (PLSDA) for predicting classes from multivariate data. For this purpose, a spectral database was formed by collecting 1,300,000 spectra generated from hyperspectral images of leaves of four different sunflower genotypes. ParSketch-PLSDA is compared to a PLSDA. Both methods use the same set of calibration and test. The prediction model obtained by PLSDA has a classification error close to 23% on average across all genotypes. ParSketch-PLSDA method outperforms PLSDA by greatly improving prediction qualities by 10%. Indeed, the model built with ParSketch-PLSDA has the ability to take into account non-linearities among data sets. These results are encouraging and allow us to anticipate the future bottleneck related to the generation of a large amount of data from phenotyping.
Origine | Fichiers produits par l'(les) auteur(s) |
---|