Data-driven prediction of soccer outcomes using enhanced machine and deep learning techniques
2024; Springer Science+Business Media; Volume: 11; Issue: 1 Linguagem: Inglês
10.1186/s40537-024-01008-2
ISSN2196-1115
AutoresEbenezer Fiifi Emire Atta Mills, Zihui Deng, Zhiyong Zhong, Jinger Li,
Tópico(s)Sports Performance and Training
ResumoThis paper introduces a novel framework for soccer game prediction using advanced machine learning and deep learning techniques, initially focusing on the Dutch Eredivisie League and later expanding to include the Scottish Premiership and the Belgian Jupiler Pro League. The methodology includes data preprocessing, feature engineering, model training, and testing. Various models are evaluated, including enhanced versions of Logistic Regression, XGBoost, Random Forest, SVM, Naive Bayes, Feedforward Neural Network, and Vanilla Recurrent Neural Network. Unlike existing studies that focus on end-of-game features, this research incorporates real-time features like half-time results and goals for in-game decision-making. Advanced data normalization and sampling methods, such as SVM-SMOTE and Near-Miss, are applied to improve model performance. Models are assessed using accuracy, recall, precision, F1-score, and Area under the ROC Curve. Results indicate that the Feedforward Neural Network excels in predicting game results, while Logistic Regression is best for predicting under and over 2.5 goals. The integration of Random Forest and XGBoost in a voting model consistently achieves the highest accuracy across both prediction tasks. The combined use of data from the three leagues further validates the models' robustness and generalizability. This study demonstrates the potential of machine and deep learning to enhance soccer game predictions through advanced techniques and comprehensive data analysis, making significant contributions to sports analytics.
Referência(s)