Evaluation of Nonparametric Machine-Learning Algorithms for an Optimal Crop Classification Using Big Data Reduction Strategy
Abstract
Accurate crop classification can support analyses of food security, environmental, and climate changes. Most of the current research studies have focused on applying available algorithms to classify dominant crops on the landscape using one source of remotely sensed data due to geoprocessing constraints (e.g., big data access, availability, and processing power). In this research, we compared four classification algorithms, including the support vector machine (SVM), random forest (RF), regression tree (CART), and backpropagation network (BPN), to select a robust and efficient classification algorithm able to classify accurately many crop types. We used multiple sources of satellite images such as Sentinel-1 (S1) and Sentinel-2 (S2) and developed a new cropping classification method for a study site in the Bekaa valley, Lebanon, fully implemented on Google Earth Engine Platform, which minimized those geoprocessing constraints. The algorithm selection was based on their popularity, availability, simplicity, similarity, and diversity. In addition, we adopted different strategies that included changing the number of crops. The first strategy is to reduce the number of collected S2 images thereafter S1; the second strategy is to use S2 images separately and then combining S2 and S1. This study results proved that the RF is the most robust algorithm for crop classification, showing the highest overall accuracy (OA) (95.4%) and a kappa index of 0.94, followed by BPN, SVM, and CART, respectively. The performance of these algorithms based on major crop types such as wheat or potato showed that CART is the highest with OA (98%) followed by RF, SVM, and BPN, respectively. Nevertheless, CART fails to classify other minor crop types. We concluded that RF is the best algorithm for classifying different crop types in the study area, using multiple remote sensing data sources.