Features' selection based on weighted distance minimization, application to biodegradation process evaluation
Abstract
Infrared spectroscopy can provide useful information of the biomass composition and has been extensively used in several domains such as biology, food science, pharmaceutical, petrochemical, agricultural applications, etc. However, not all spectral information are valuable for biomarkers construction or for applying regression or classification models and by identifying interesting wavenumbers a better processing and interpretation can be achieved. The selection of optimal subsets has been addressed through several variable or feature selection methods including genetic algorithms. Some of them are not adapted on large data, others require additional information such as concentrations or are difficult to tune. This paper proposes an alternative approach by considering a weighted Euclidean distance. We show on real Mid-infrared spectra that this constrained nonlinear optimizer allows identifying the wavenumbers that best highlights the discrimination within the periods of the biodegradation process of the ligno-cellulosic biomass. These results are compared with previous ones obtained by a genetic algorithm.