Evaluation results of all trained machine learning models.
Accuracy
Recall
F1 Score
ROC-AUC
XGBoost was selected as the best-performing model because it achieved the highest F1 score, excellent recall, strong ROC-AUC score, and very low overfitting gap.
Overfitting Gap
Performance comparison of all trained models
| Model | Accuracy | Precision | Recall | F1 Score | ROC-AUC | Status |
|---|---|---|---|---|---|---|
| XGBoost | 0.999060 | 0.791667 | 0.9500 | 0.863636 | 0.999804 | Best Model |
| Random Forest | 0.999060 | 0.818182 | 0.9000 | 0.857143 | 0.999741 | Strong |
| Extra Trees | 0.998747 | 0.800000 | 0.8000 | 0.800000 | 0.999827 | Good |
| Decision Tree | 0.998433 | 0.727273 | 0.8000 | 0.761905 | 0.899529 | Good |
| Gradient Boosting | 0.997650 | 0.571429 | 1.0000 | 0.727273 | 0.999788 | High Recall |
| KNN | 0.997650 | 0.575758 | 0.9500 | 0.716981 | 0.999493 | Average |
| AdaBoost | 0.994517 | 0.358491 | 0.9500 | 0.520548 | 0.999120 | Weak Precision |
| Naive Bayes | 0.982297 | 0.139535 | 0.9000 | 0.241611 | 0.986445 | Weak |
| Logistic Regression | 0.980887 | 0.135714 | 0.9500 | 0.237500 | 0.997234 | Baseline |
Mean CV F1 Score: 0.999725
Standard Deviation: 0.000169
This shows that the model performs consistently across different training folds.
Training Accuracy: 0.9999
Testing Accuracy: 0.9991
Overfitting Gap: 0.0009
The very small gap indicates that the model generalizes well and is not overfitted.