Numerical Experiments¶
Backtest Protocol¶
All experiments use a strict walk-forward protocol: at gameweek \(t\), only GW1–\((t-1)\) data is available to the inference pipeline. No future information leaks into predictions.
- Dataset:
vaastav/Fantasy-Premier-League - Seasons: 2023–24 and 2024–25
- Evaluation window: GW6–38 (33 gameweeks; 5-GW burn-in)
- Player pool: ~600 players per gameweek
Oracle Computation¶
The oracle score at each gameweek is the solution to:
solved using TwoLevelILPOptimizer with actual realized points. Oracle totals:
4560 pts (2023–24) and 4369 pts (2024–25) over 33 GWs.
Results Summary¶
Pre-DGW-fix baseline numbers
These results were produced before the DGW integration described in Double Gameweeks. Three changes affect the numbers when re-run:
- Oracle scores will increase in DGW gameweeks —
get_actual_pointspreviously dropped the first fixture row, understating actual points. - Strategy scores will increase in DGW-heavy weeks — the ILP now
correctly values DGW players at
n_fixtures × E[P]. - Inference MSE will decrease slightly — the HMM no longer
misclassifies DGW totals as extreme single-game events;
points_normkeeps emissions on a consistent scale.
The qualitative ranking of strategies is not expected to change. Updated numbers will be added after the next full backtest run.
| Strategy | 2023–24 pts | % Oracle | 2024–25 pts | % Oracle |
|---|---|---|---|---|
| Greedy (rolling avg) | 1272 | 27.9% | 1216 | 27.8% |
| ILP + EWMA | 1511 | 33.1% | 1612 | 36.9% |
| ILP + MV-HMM | 1325 | 29.1% | 1380 | 31.6% |
| ILP + Enriched | 1949 | 42.7% | 1791 | 41.0% |
| ILP + Semivar (λ=0.5) | 1881 | 41.3% | 1757 | 40.2% |
| ILP + Blend (λ=0) | 1919 | 42.1% | 1774 | 40.6% |
| ILP + TFT | 1334 | 29.3% | 1352 | 30.9% |
| Lagrangian + Enriched | 1912 | 41.9% | 1749 | 40.0% |
Key Findings¶
Forecast quality determines squad quality. ILP + Enriched outperforms ILP + EWMA by 29.1% (2023–24) and 11.1% (2024–25). This is the primary empirical confirmation that the inference investment pays off in selection.
TFT paradox. TFT achieves the lowest MAE in both seasons (0.837–0.918) but the worst ILP input. MAE-optimal forecasts do not minimize the structure of rank-error correlations that matter for ILP selection. Players at the top of the predicted ranking tend to have correlated errors — the optimizer selects them together, and they together underperform.
Semi-variance gives a genuine risk-return trade-off. In 2024–25, semivar λ=0.5 reduces CV from 0.300 to 0.260 (−13%) at a cost of only −34 total points (−1.9%). Mean-variance penalizes star performers and is not recommended.
Lagrangian relaxation is viable. 97.7–98.1% of full ILP performance with ~50 subgradient iterations. Integrality gap < 5% in all tested gameweeks.
Duality Analysis¶
Shadow prices from the LP relaxation (averaged across 33 GWs):
| Constraint | Mean shadow price | Interpretation |
|---|---|---|
| Budget | ~0.6–0.8 pts/£1m | Marginal value of one additional £1m of budget |
| GK quota | ~0 | GK constraint rarely binds (single GK per format) |
| DEF/MID quota | ~0 | Range constraints: both min and max slack in most GWs |
| FWD quota (max=3) | Occasionally positive | Top FWDs are frequently budget-constrained |
| Team cap (top-6) | Occasionally positive | Arsenal, City, Chelsea caps bind in strong GWs |
The budget constraint has the largest and most consistent shadow price, confirming that capital allocation is the primary driver of squad quality.
Integrality Gap¶
The integrality gap \((z^*_{LP} - z^*_{ILP}) / z^*_{LP}\) is below 5% in all evaluated gameweeks. This tight gap validates: 1. The LP relaxation bound is informative 2. Lagrangian relaxation (which relaxes to the LP bound) loses little 3. The feasible integer solution is close to the fractional optimum