NBA game simulation
Introduction NBA games are hardly predictable. Although it is clear that leader has more chances against offsider, there can be situations like injuries, bad shooting or fouling out. This factors influence the result of the game. My idea is to try forecasting games' results based on individual players' statistics. It can be useful for advanced sport analytics, simulating the season and even for betting.
Contents
Problem definition and Method
As an input data I got a dataset for 2017/2018 season with boxscores of the games with different players' statistics. There were a lot of indicators for every team member per individual game. I had to clean the data, so the attributes that I needed for the simulation were: Player, Team, Position, Played Games, Points, Role, Arena, Minutes Played. For the simulation, I chose Monte Carlo simulation for forecasting.
Model
As I have already mentioned, I have cleaned the dataset, so only relevant attributes remain. After this, some manipulations were needed. All the calculated fields were diveded by arena - Home or Away.
First of all, I decided to find out in how many games individual player took part. Then I was able to calculate probability of missing the game and being a starter.
Then I added points rate (points per minute) for each game per player, found out minimum and maximum rates and defined minimum and maximum minutes played.
Before simulating results, I created a list called :"Preprocess", where the exact simulation is running.
So, if a user chooses 2 teams (Home and Away), it calculates starting 5, if player misses the game, minutes and points. There is an important constaint regarding the values: there are 5 starters and 5 positions only.
Starting 5 is defined by the Role (starter)/Games Played. If there are several players, who were switching between each other in start, there will be a random choice.
Minutes are calculated by random between minimum and maximum minutes played. For starting players are defined at once. There is a limit of 48 minutes per each position, so I decided to make a coefficient, which represents a part per each bench player to remaining time.
And finally points are calculated as random between minimum and maximum player's points rate multiply minutes played.
Results
As a result, if we combine all of the players' points, we can identify a winner.
Conclusions
After many simulations, I ended up with conclusion, that there is a clear home advantage. Some of the results were even quite similar to real ones. But mainly starting players influence the game: if they have more minutes, there is a higher possibility of winning. It comes up from points rate, and by average starting 5 has it higher.
Source and Code
Simulation with datasource and manipulations can be found there: https://www.simulace.info/images/MC_NBA_prin03.xlsx (Original datasource: https://www.kaggle.com/datasets/pablote/nba-enhanced-stats?resource=download)
Comments
There can be some bugs when running a simulation. Mainly it occurs when a person opens it in English version of Excel. On my personal laptop it works perfectly, but I tried on other and it somehow crashed a couple of time (#SPILL). After some refreshes, you can get the result.