“League of Legends” (LoL) is the world’s most watched esport, drawing 30M+ peak viewers to its 2023 World Final.
The dataset from Oracle’s Elixir records every professional game: gold, experience, kills, objectives – all time-stamped.
Central question
If a team holds an assist and experience lead at 10 minutes, how likely are they to win the entire match?
Why you should care:
Item | Value |
---|---|
Raw rows loaded | ≈ 150K player+team rows |
Rows after filtering | 21312 |
Season analysed | 2022 (patch 12.01) |
Relevant columns | goldat10 , opp_goldat10 , xpat10 , opp_xpat10 , golddiffat10 , xpdiffat10 , killsat10 , opp_killsat10 , assistsat10 , opp_assistsat10 , firstblood , result |
Column descriptions
- goldat10 / opp_goldat10: Total team gold at 10:00 (ms).
- xpat10 / opp_xpat10: Cumulative experience points at 10:00.
- golddiffat10 / xpdiffat10: Our minus their resource counts (positive = advantage).
- kills/assists at10: Scoreboard differentials.
- firstblood: Whether this team secured first kill.
- result:
True
= win,False
= loss.
position == 'team'
or participantid ∈ {100, 200}
; removes per-player duplicates.float
; invalid strings → NaN
.golddiffat10 = goldat10 − opp_goldat10
, same for XP.killsdiff10
, assistsdiff10
for modelling; cast firstblood → bool
.Head of the cleaned dataframe (5 rows)
gameid | side | result | golddiffat10 | xpdiffat10 | killsdiff10 | assistsdiff10 |
---|---|---|---|---|---|---|
ESPORTSTMNT01_2690210 | Blue | False | 1523 | 137 | 3 | 5 |
ESPORTSTMNT01_2690210 | Red | True | -1523 | -137 | -3 | -5 |
ESPORTSTMNT01_2690219 | Blue | False | -1619 | -1586 | -2 | -2 |
ESPORTSTMNT01_2690219 | Red | True | 1619 | 1586 | 2 | 2 |
ESPORTSTMNT01_2690227 | Blue | True | -103 | 813 | -1 | -1 |
Gold diff @ 10 min is nearly normal with μ ≈ 0 and σ ≈ 1.8k gold; extreme early snowballs (>5k) are rare.
The diagonal trend shows gold and XP leads move together. Orange (wins) dominate the upper-right quadrant, supporting our hypothesis.
Win-rate by binned gold lead:
gold_bin | Win Rate |
---|---|
(-1885, -942] | 0.25 |
(0, 942] | 0.60 |
(2828, 3771] | 0.92 |
… | … |
Crossing +1k gold flips the matchup above 60% win chance; beyond +5k, victories are almost guaranteed.
We did not impute: missing early-game stats only occur in “remake” or prematurely terminated games and were dropped to avoid skewing results.
result
, as it is the official match outcome recorded by tournament servers.Name | Type | Encoding | Rationale |
---|---|---|---|
assistsdiffat10 |
Quantitative | Standard-scaled | Measures supportive pressure and skirmish tempo. |
xpdiffat10 |
Quantitative | Standard-scaled | Captures lane dominance and level advantages. |
There are 0 ordinal and 0 nominal predictors, so no one-hot encoding was required.
StandardScaler ➜ LogisticRegression(max_iter=1000, C=1.0)
Metric | Value |
---|---|
Accuracy | 0.686 |
ROC AUC | 0.750 |
Is the model “good”?
A naïve coin-flip would score 0.50 accuracy and 0.50 AUC. Hitting 0.75 AUC means the model ranks a random win higher than a random loss 3 times out of 4 – respectable for only two features, but still misclassifies ~31% of held-out games. The linear boundary also assumes additive effects; real games likely exhibit non-linear interactions, so we expect headroom for improvement.
Added feature | Type | Why it should help before seeing results |
---|---|---|
golddiffat10 |
Quantitative | Gold directly buys combat stats; even a small 200g lead can secure first dragon or Herald. |
killsdiff10 |
Quantitative | A kill grants 300g + tempo (death timer, map pressure) not fully reflected by assists alone. |
firstblood |
Nominal (binary) | First Blood awards extra 400g and often snowballs lanes; encoded as 0/1. |
Features are scaled through StandardScaler
(numeric) while firstblood
is already binary (no further encoding).
n_estimators {200,400}
, learning_rate {0.03,0.05,0.08}
, max_depth {2,3}
. Best = 200 trees, lr 0.03, depth 2.Model | Accuracy | ROC AUC |
---|---|---|
Baseline | 0.686 | 0.750 |
Final | 0.706 | 0.777 |
Why performance improved (data-generating perspective)
Gold is the universal currency for item power spikes; including golddiffat10
lets the model separate teams that traded kills for farm from those that did not. killsdiff10
captures raw elimination dominance, orthogonal to assists. firstblood
encodes an early discrete snowball event worth ~400g that can’t be inferred solely from continuous differentials.
Tree ensembles exploit interactions like “XP lead and firstblood” being disproportionately strong, something the linear baseline cannot model. The resulting +0.027 AUC and +0.02 accuracy confirm these early-game signals sharpen win-probability estimates.