Skip to content

News Signals

FPLX uses news data from the FPL API itself, no external scraping required.

Data Source

Every player in the bootstrap-static API response includes:

Field Example Type
news "Knee injury - expected back 01 Feb" str
status "i" str: a, d, i, s, u, n
chance_of_playing_next_round 25 int or None
chance_of_playing_this_round 0 int or None
news_added "2026-01-20T10:00:00Z" str

Your existing FPLDataLoader.fetch_bootstrap_data() already fetches this. The NewsCollector extracts and persists it per gameweek.

Data Flow

graph LR
    A["bootstrap-static API"] --> B["NewsCollector.collect_from_bootstrap()"]
    B --> C["NewsSnapshot (per player, per GW)"]
    C --> D["snapshot.to_news_signal_input()"]
    D --> E["NewsSignal.generate_signal()"]
    E --> F["pipeline.inject_news()"]
    F --> G["HMM: transition perturbation"]
    F --> H["KF: process noise shock"]

NewsSignal Output

NewsSignal.generate_signal(text) returns:

{
    "availability": 0.0,     # 0.0 (out) to 1.0 (available)
    "minutes_risk": 0.0,     # 0.0 (no risk) to 1.0 (high risk)
    "confidence": 0.9,       # 0.4 (vague) to 0.9 (definitive)
    "adjustment_factor": 0.0  # availability × (1 - minutes_risk)
}

The adjustment_factor is used by the legacy pipeline. The inference pipeline uses all four fields.

Perturbation Mapping

The pipeline classifies each signal into a category, then maps to specific perturbations:

Category Trigger HMM Boost KF Q Multiplier
Unavailable "ruled out", status=i Injured ×10, Slump ×2 5.0
Doubtful "late fitness test", status=d Injured ×3, Slump ×2 2.0
Rotation "rotation risk", "benched" Slump ×2, Average ×1.5 1.5
Positive "back in training" Good ×2, Star ×1.5 1.0
Neutral No news, status=a No change 1.0

NewsSnapshot Enrichment

NewsSnapshot.to_news_signal_input() combines raw news text with structured fields for richer parsing:

# Raw API data
news_text = "Hamstring injury - expected back 01 Feb"
status = "i"
chance_next = 25  # percent

# Enriched text fed to NewsSignal
# → "Hamstring injury - expected back 01 Feb. Status: injured. 25% chance of playing"

This gives NewsParser more signal than the raw text alone.

Per-Gameweek Persistence

NewsCollector saves snapshots to ~/.fplx/news/gw{NN}.json. This enables backtesting: replay a full season's news week by week to validate the inference pipeline against historical data.

from fplx.data.news_collector import NewsCollector

collector = NewsCollector(cache_dir="~/.fplx/news")

# Collect current state
collector.collect_from_bootstrap(bootstrap_data, gameweek=25)

# Retrieve later (loads from disk)
snapshot = collector.get_player_news(player_id=301, gameweek=25)
flagged = collector.get_players_with_news(gameweek=25)
history = collector.get_player_history(player_id=301)  # all GWs