How Machine Learning Can Evaluate The Influence Of Socio-Economic and Climatic Factors in Agricultural Yield: A Case Of Nigeria

Dappa Tamuno-Opubo, Godwin (2023-05-13)

Main Theses


The major international agencies in charge of nutrition are becoming increasingly concerned about global agricultural production in particular. Food insecurity has emerged in some populated areas, including Africa, as a result of the increased worldwide need for food as a result of record population growth. Climate change and its variability are two additional factors that contribute to world food insecurity. Furthermore, agricultural policy officials, farmers, and decision-makers require advanced technologies in order to make timely strategies or policies that will have an effect on the quality of crop harvests. Machine learning and other new, powerful analytical techniques made possible by big data technologies have already proven useful in a number of industries, including biology, finance, and medicine. The yield of three major crops, including cocoa, sesame, and cashew, at the national level in Nigeria during the course of the years spanning 1990 to 2020 is forecasted in this study using a machine learning-based prediction method. We used climatic, agricultural yield, and socioeconomic data to help policymakers and farmers anticipate the yearly agricultural output in Nigeria. We employed k-nearest neighbors, a decision tree, and random forest. We also employed a hyper-parameter tweaking technique through cross-validation to enhance the model and avoid overfitting. For sesame, the accuracy of the Decision Tree model was the highest, having a test accuracy of 97.92% for socioeconomic and climatic factors combined, while the KNN model did the best with a test accuracy of 99.71% for climatic components separately. The accuracy of the Random Forest model was 87.54% for climatic elements alone and 87.64% for socioeconomic and economic factors together. For cocoa, the Decision Tree model had an accuracy of 89.49% for socioeconomic and climatic factors combined and 89.51% for climatic components alone, while the KNN model had the best accuracy of 90.71% for climatic elements alone. For socioeconomic and climatic factors taken together, the Random Forest model's accuracy was 87.82%; for climatic components alone, it was 88.83%. For cashew nuts, the accuracy of the KNN model was 78.38% for socioeconomic and climatic components combined and 99.81% for climatic factors alone, compared to 88.27% for socioeconomic and climatic elements combined and 86.58% for climatic factors alone for the Decision Tree model. For both socioeconomic and climatic components combined, the Random Forest model's accuracy was 98.50%, while for climatic factors alone, it was 98.75%. In conclusion, the Random Forest model outperformed the KNN and Decision Tree models across all crop and factor combinations. Our findings indicate that machine learning algorithms can be used to forecast crop yields with reasonable accuracy when socioeconomic and meteorological variables are combined.