NBA Monte Carlo Simulation
Contents
Introduction
NBA Monte Carlo Simulation is a Jupyter Notebook application, which is used to predict NBA games result. There are different potential users of this app: from fantasy league players to sports betters. This simulation also can help advanced basketball analytics by predicting possible results.
Installation
Before starting the notebook, please, ensure, that you fullfilled the requirements (from requirements.txt).
1) install python (https://www.python.org/downloads/) 2) install jupyter notebook (https://jupyter.org/install) 3) install python libraries
1. run jupyter notebook 2. on opened window, click new -> terminal
3. in terminal, paste in order: - pip install pandas - pip install numpy - pip install ipywidgets
4) go back to jupyter menu, click upload and select .ipynb file (from this page)
Running the application
Application description After opening .ipynb file, click "Run" on the menu above. Under the cell, you will see 2 input fields. Please, provide the path of Excel file (from this page) and sheet name (default is Data_final)
After clicking "Confirm", user will be offered with 3 variants: get data analysis, simulate one game or simulate playoffs. If user chooses analysis, some graphs with tables will be generated. In case one game simulation, user can choose 2 teams and simulate the result for them. In case simulating the Playoffs, user will be able to select bracket by seeds and simulate till NBA Finals.
Pre-Analysis Based on the final data, we can say that there is a direct relationship between PPG and MPG. This simulation is based on 2 random variables: scoring rate and missed game. The first is calculated using the PPG / MPG formula with a standard variation, and the second as a percentage. Therefore, we can say that the team's success directly depends on 2 main aspects: the higher the scoring rate and MPG and the more games a player played in a season, the more likely it is to win. The essence of the second aspect concludes in the fact that in modern basketball, player injuries, as well as the composition of the rotation, play a significant role. And in this case, I see the possibility of improving the program in that if a rotation player is injured, the substitute player gets more chances to enter the floor and minutes per game.
Simulation As already mentioned, there are 2 random variables: scoring rate (with standard deviation) and game missing chance (to make games more chaotic, as real ones). Firstly, it calculates the percentage of player to miss the game - random between 0 and 100. If players game participation statistic is more or equal to this variable, then a player will be in rotation. Then, for those, who are active, the program calculates scoring rate - draws random samples from Gaussian distribution. In the application, you should also input parameter n - how many times the simulation should be repeated. And final result is got by averaging team scores (e.g. 2 times - 110, 120, final score 115).
Results
Post-Analysis After many combinations and many tries, I can prove my words from pre-analysis, that simulation highly depends on scoring rate and possibility of loosing key players. The top 5 teams I got as NBA champions are:
- DEN (Denver Nuggets, real life champions, approx. 60 times)
- LAL (Los Angeles Lakers, real life Western Conferenc Finalist, 38 times)
- ATL (Atlanta Hawks, real life lost in first round, 35 times)
- NY (New York Knicks, real life lost in second round, 27 times)
- BOS (Boston Celtics, real life Eastern Conference Finalist, 24 times)
This year (2022/2023) Playoffs were quite surprise, because number 8 seed Miami Heat went to NBA Finals, but they never appeared during my simulation testing as NBA champions. Also, Milwaukee Bucks, who were belived as highly possible NBA champions, lost in the first round. Otherwise, Denver Nuggets came as NBA champions much more times then others in the simulation, which can mean that the model can really represent the reality.
Improvements From UI/UX point of view, I tried to divide the logical parts of running the app. Unfortunately, due to lack of times and jupyter notebook abilities, there were no chance to make truly user friendly application for analysis. Of course, users see all the details from output, but I believe it would be more convenient in e.g. Flask Web Application.
From simulation perspective, the main improvement I see is adding more minutes and possibilities for other players from rotation. Also, I decided to exclude minutes played random variable, because when I tried to run the simulation with it, the results were very chaotic and unrealistic - one team could score only 50 points, other team more then 200, which, of course, does not make sense. I think there can be kind of algorithm to somehow limit the variable, because the cause of problem is tbat standard deviation of minutes per game is very huge (up to 16 minutes, which means with average 0.43 scoring rate makes +- 7 points per player).
Conclusion
To sum up, I would like to say that the simulation is reliable and relates to reality. There can be some improvements from both UI and backend sides, but otherwise the aim of the simulation is completed, and the application can be used for real life predictions.
Files
All the documents can be downloaded here. Simulation (ipynd):
Data source (xlsx):
Requirements (txt):
Data sources
https://edraft.com/nba/fantasy-basketball/tools/player-consistency/?season=2022-2023&col1=0&col2=0&con=0 https://www.basketball-reference.com/leagues/NBA_2023_per_game.html