Architecture¶
Package Map¶
fplx/
├── api/
│ └── interface.py FPLModel orchestrator
│
├── core/
│ ├── player.py Player dataclass
│ ├── squad.py Squad + FullSquad dataclasses
│ └── matchweek.py Gameweek context
│
├── data/
│ ├── loaders.py FPL API client + caching
│ ├── vaastav_loader.py Historical dataset loader (vaastav/Fantasy-Premier-League)
│ ├── double_gameweek.py DGW detection, timeseries aggregation, prediction scaling
│ ├── tft_dataset.py Panel dataset builder for TFT
│ ├── news_collector.py Per-gameweek news snapshot persistence
│ └── schemas.py Pydantic validation schemas
│
├── evaluation/
│ ├── metrics.py InferenceMetrics + OptimizationMetrics accumulators
│
├── inference/
│ ├── hmm.py Scalar HMM: Forward, FB, Viterbi, Baum-Welch
│ ├── kalman.py 1D Kalman Filter with adaptive noise + RTS smoother
│ ├── multivariate_hmm.py Position-specific MV-HMM with diagonal Gaussian emissions
│ ├── enriched.py Feature-rich ridge predictor + semi-variance
│ ├── tft.py TFT forecaster wrapper (quantile predictions)
│ ├── fusion.py Inverse-variance weighting
│ └── pipeline.py Per-player orchestrator with signal injection
│
├── selection/
│ ├── constraints.py FormationConstraints, BudgetConstraint, TeamDiversityConstraint
│ ├── optimizer.py TwoLevelILPOptimizer + GreedyOptimizer
│ ├── lagrangian.py LagrangianOptimizer (subgradient dual ascent on budget constraint)
│ └── base.py BaseOptimizer ABC
│
├── signals/
│ ├── news.py Text → availability / minutes_risk / confidence
│ ├── fixtures.py Fixture difficulty + congestion signals
│ └── stats.py Weighted statistical scoring
│
├── timeseries/
│ ├── transforms.py Rolling, lag, EWMA, trend, consistency
│ └── features.py FeatureEngineer pipeline (40+ features)
│
└── utils/
├── config.py Nested Config with dot-notation access
└── validation.py Data quality checks + imputation
scripts/
├── backtest_season.py Full walk-forward backtest (inference + optimization)
├── train_tft.py TFT training script
└── fetch_live_gw.py Live gameweek deployment (FPL API → squad selection)
Two-Level ILP Architecture¶
The optimizer solves a single joint problem for both squad and lineup:
Level 1: 15-player squad (s_i ∈ {0,1})
├── Budget ≤ £100m (applied to squad, not lineup)
├── Position quotas: 2 GK, 5 DEF, 5 MID, 3 FWD
└── Team diversity: max 3 from any club
Level 2: 11-player lineup (x_i ∈ {0,1}, x_i ≤ s_i)
├── Lineup size = 11
├── 1 GK, 3–5 DEF, 2–5 MID, 1–3 FWD
└── Objective: max Σ (μ̂_i − λ·ρ_i) · x_i
where ρ_i = sqrt(σ̂²_i) (mean-variance) or ρ_i = σ̂⁻_i (semi-variance).
Double Gameweek Handling¶
DGW handling has a single entry point in the data layer:
graph LR
RAW[Per-fixture rows] --> AGG[aggregate_dgw_timeseries\nauto-called in build_player_objects]
AGG -->|one row / GW\npoints_norm| INF[Inference pipeline\nDGW-agnostic]
INF -->|per-fixture E[P], Var| SC[scale_predictions_for_dgw\n× n_fixtures before ILP]
SC --> ILP[Two-Level ILP]
Inference components never see raw multi-row DGW data. The only DGW-aware step after the data layer is the ILP scaling call.
Lazy Initialization¶
All components use the @property pattern — instantiated on first access:
@property
def data_loader(self):
if self._data_loader is None:
self._data_loader = FPLDataLoader(**self.config.get("data_loader", {}))
return self._data_loader
Execution Paths¶
graph TD
FIT["fit()"] --> CHECK{model_type?}
CHECK -->|inference| PIPE["PlayerInferencePipeline per player"]
CHECK -->|baseline / xgboost| LEG["FeatureEngineer + Model.predict()"]
PIPE --> BLEND["Enriched+MV-HMM blend → E[P], Var[P], DR"]
LEG --> EP["expected_points only"]
BLEND --> DGW["DGW scaling"]
EP --> OPT["TwoLevelILPOptimizer.solve()"]
DGW --> OPT
OPT --> SQUAD["FullSquad (15 players + 11-player lineup)"]