by Yunus Güral The high variability and nonlinear relationships between environmental variables (such as temperature, relative humidity, and altitude) in ecological datasets prevent classical statistical models from obtaining accurate predictions. This study aimed to compare and investigate the performance of AI-based machine learning methods in analyzing complex ecological data structures.
An agricultural dataset containing meteorological and vegetation variables was used as the representative case study. This dataset is based on population observations of Cimbex quadrimaculata in Diyarbakır (Eğil) and Elazığ (Keban) provinces in Türkiye between 2020 and 2022.
Three different modeling approaches (binary classification, multiclass classification, and regression) were applied to the same data. This three-approach design enabled a systematic comparison of model performance, generalizability, and explainability on the same dataset using different definitions of the target variable.
For classification tasks, the model performance was evaluated using accuracy, F1 score, and AUC metrics under a stratified 10-fold cross-validation scheme. Regression models, on the other hand, were assessed within a nested cross-validation framework using R², root mean square error (RMSE), mean absolute error (MAE).
quadrimaculata observations in Diyarbakır (Eğil) and Elazığ (Keban), Türkiye, spanning 2020–2022.
Where such specifics are not reported, the findings are presented as stated without extrapolation.