Skip to content

Numerical Experiments

Backtest Protocol

All experiments use a strict walk-forward protocol: at gameweek \(t\), only GW1–\((t-1)\) data is available to the inference pipeline. No future information leaks into predictions.

  • Dataset: vaastav/Fantasy-Premier-League
  • Seasons: 2023–24 and 2024–25
  • Evaluation window: GW6–38 (33 gameweeks; 5-GW burn-in)
  • Player pool: ~600 players per gameweek

Oracle Computation

The oracle score at each gameweek is the solution to:

\[\text{Oracle}(t) = \max_{s,x} \sum_i P^{\text{actual}}_i(t)\, x_i \quad \text{s.t.\ all squad/lineup constraints}\]

solved using TwoLevelILPOptimizer with actual realized points. Oracle totals: 4560 pts (2023–24) and 4369 pts (2024–25) over 33 GWs.

Results Summary

Pre-DGW-fix baseline numbers

These results were produced before the DGW integration described in Double Gameweeks. Three changes affect the numbers when re-run:

  • Oracle scores will increase in DGW gameweeks — get_actual_points previously dropped the first fixture row, understating actual points.
  • Strategy scores will increase in DGW-heavy weeks — the ILP now correctly values DGW players at n_fixtures × E[P].
  • Inference MSE will decrease slightly — the HMM no longer misclassifies DGW totals as extreme single-game events; points_norm keeps emissions on a consistent scale.

The qualitative ranking of strategies is not expected to change. Updated numbers will be added after the next full backtest run.

Strategy 2023–24 pts % Oracle 2024–25 pts % Oracle
Greedy (rolling avg) 1272 27.9% 1216 27.8%
ILP + EWMA 1511 33.1% 1612 36.9%
ILP + MV-HMM 1325 29.1% 1380 31.6%
ILP + Enriched 1949 42.7% 1791 41.0%
ILP + Semivar (λ=0.5) 1881 41.3% 1757 40.2%
ILP + Blend (λ=0) 1919 42.1% 1774 40.6%
ILP + TFT 1334 29.3% 1352 30.9%
Lagrangian + Enriched 1912 41.9% 1749 40.0%

Key Findings

Forecast quality determines squad quality. ILP + Enriched outperforms ILP + EWMA by 29.1% (2023–24) and 11.1% (2024–25). This is the primary empirical confirmation that the inference investment pays off in selection.

TFT paradox. TFT achieves the lowest MAE in both seasons (0.837–0.918) but the worst ILP input. MAE-optimal forecasts do not minimize the structure of rank-error correlations that matter for ILP selection. Players at the top of the predicted ranking tend to have correlated errors — the optimizer selects them together, and they together underperform.

Semi-variance gives a genuine risk-return trade-off. In 2024–25, semivar λ=0.5 reduces CV from 0.300 to 0.260 (−13%) at a cost of only −34 total points (−1.9%). Mean-variance penalizes star performers and is not recommended.

Lagrangian relaxation is viable. 97.7–98.1% of full ILP performance with ~50 subgradient iterations. Integrality gap < 5% in all tested gameweeks.

Duality Analysis

Shadow prices from the LP relaxation (averaged across 33 GWs):

Constraint Mean shadow price Interpretation
Budget ~0.6–0.8 pts/£1m Marginal value of one additional £1m of budget
GK quota ~0 GK constraint rarely binds (single GK per format)
DEF/MID quota ~0 Range constraints: both min and max slack in most GWs
FWD quota (max=3) Occasionally positive Top FWDs are frequently budget-constrained
Team cap (top-6) Occasionally positive Arsenal, City, Chelsea caps bind in strong GWs

The budget constraint has the largest and most consistent shadow price, confirming that capital allocation is the primary driver of squad quality.

Integrality Gap

The integrality gap \((z^*_{LP} - z^*_{ILP}) / z^*_{LP}\) is below 5% in all evaluated gameweeks. This tight gap validates: 1. The LP relaxation bound is informative 2. Lagrangian relaxation (which relaxes to the LP bound) loses little 3. The feasible integer solution is close to the fractional optimum