EPL Match Winner Prediction through Machine Learning

Predict the outcome of English Premier League matches using machine learning and historical match data. This project leverages web scraping, data cleaning, feature engineering, and Random Forest classification to make informed predictions.

Project Overview

The project is divided into two main parts: Scraping and Prediction.

data_scrape.ipynb

We scrape match results from the EPL to build a dataset for predicting match winners. The notebook uses the requests library to download data and Beautiful Soup to parse HTML, extracting the relevant tables. The data is then loaded into pandas for cleaning and preparation for the machine learning model.

Workflow recap:

Start with the standings_url for the Premier League
Scrape the table data from the HTML page
Access each team's individual page
Scrape scores and fixtures for each match of that team
Scrape shooting stats for each match of that team
Combine the data frames for each team
Loop through each season and selected teams to consolidate into a single data frame
Merge everything and perform minor data cleanup

ml_predictions.ipynb

This notebook predicts winners for EPL matches using the scraped dataset.

Steps included:

Load match data into pandas and clean it
Apply a Random Forest machine learning model to predict match outcomes
Measure the algorithm's performance and improve model accuracy
Continue iteratively improving the model

Summary of the Prediction process:

Scraped data about Premier League matches
Cleaned the data and prepared it for the ML model
Created a simple ML model with a few predictors and a target
Trained a Random Forest model
Computed accuracy and precision scores
Improved accuracy by generating more predictors using rolling averages
Enhanced precision by analyzing both sides of the matches of team A and team B
Achieved a final model precision of 67.5 percent

Possible next steps:

Scrape more seasons, ideally 10 to 20, to improve predictions
Use additional columns in the dataset to generate more features for the model

Technologies Used

Python 3.8+
Pandas for data manipulation and analysis
Scikit-learn for machine learning
Requests for web scraping
Beautiful Soup for HTML parsing
Jupyter Notebook for interactive development

Repository Structure

epl-ml-match-winner-prediction/
│
├── data_scrape.ipynb # Notebook to scrape EPL match data
├── ml_predictions.ipynb # Notebook for ML model training and evaluation
├── matches_data.csv # Raw match data collected
├── requirements.txt # Python dependencies
└── LICENSE # Project license

Installation and Setup

Clone the repository

git clone https://github.com/andreyestevam/epl-ml-match-winner-prediction.git
cd epl-ml-match-winner-prediction

Install required packages

pip install -r requirements.txt

Run Jupyter Notebooks
You can either: Scrape the data (can be slightly different than the one collected in this repository due to possible website changes):

jupyter notebook data_scrape.ipynb

Or: Train and Evaluate the ML model:

jupyter notebook ml_predictions.ipynb

Model Evaluation

The Random Forest model is evaluated using accuracy and precision metrics. Initial models are simple, but precision improves significantly by adding rolling averages, using more predictors, and considering both teams' historical performance. The final precision achieved, with the existing data in this repository, is 67.5 percent.

Credits / Acknowledgment

This project was inspired by a project-based learning approach. All data cleaning, feature engineering, and machine learning modeling were implemented independently, reflecting hands-on experimentation and practical learning with Python, Pandas, Beautiful Soup, and scikit-learn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EPL Match Winner Prediction through Machine Learning

Table of Contents

Project Overview

data_scrape.ipynb

ml_predictions.ipynb

Technologies Used

Repository Structure

Installation and Setup

Model Evaluation

Credits / Acknowledgment

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
data_scrape.ipynb		data_scrape.ipynb
matches_data.csv		matches_data.csv
ml_predictions.ipynb		ml_predictions.ipynb
requirements.txt		requirements.txt

License

andreyestevam/epl-ml-match-winner-prediction

Folders and files

Latest commit

History

Repository files navigation

EPL Match Winner Prediction through Machine Learning

Table of Contents

Project Overview

data_scrape.ipynb

ml_predictions.ipynb

Technologies Used

Repository Structure

Installation and Setup

Model Evaluation

Credits / Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages