News Signals¶

FPLX uses news data from the FPL API itself, no external scraping required.

Data Source¶

Every player in the bootstrap-static API response includes:

Field	Example	Type
`news`	`"Knee injury - expected back 01 Feb"`	`str`
`status`	`"i"`	`str`: `a`, `d`, `i`, `s`, `u`, `n`
`chance_of_playing_next_round`	`25`	`int` or `None`
`chance_of_playing_this_round`	`0`	`int` or `None`
`news_added`	`"2026-01-20T10:00:00Z"`	`str`

Your existing FPLDataLoader.fetch_bootstrap_data() already fetches this. The NewsCollector extracts and persists it per gameweek.

Data Flow¶

graph LR
    A["bootstrap-static API"] --> B["NewsCollector.collect_from_bootstrap()"]
    B --> C["NewsSnapshot (per player, per GW)"]
    C --> D["snapshot.to_news_signal_input()"]
    D --> E["NewsSignal.generate_signal()"]
    E --> F["pipeline.inject_news()"]
    F --> G["HMM: transition perturbation"]
    F --> H["KF: process noise shock"]

NewsSignal Output¶

NewsSignal.generate_signal(text) returns:

{
    "availability": 0.0,     # 0.0 (out) to 1.0 (available)
    "minutes_risk": 0.0,     # 0.0 (no risk) to 1.0 (high risk)
    "confidence": 0.9,       # 0.4 (vague) to 0.9 (definitive)
    "adjustment_factor": 0.0  # availability × (1 - minutes_risk)
}

The adjustment_factor is used by the legacy pipeline. The inference pipeline uses all four fields.

Perturbation Mapping¶

The pipeline classifies each signal into a category, then maps to specific perturbations:

Category	Trigger	HMM Boost	KF Q Multiplier
Unavailable	`"ruled out"`, status=`i`	Injured ×10, Slump ×2	5.0
Doubtful	`"late fitness test"`, status=`d`	Injured ×3, Slump ×2	2.0
Rotation	`"rotation risk"`, `"benched"`	Slump ×2, Average ×1.5	1.5
Positive	`"back in training"`	Good ×2, Star ×1.5	1.0
Neutral	No news, status=`a`	No change	1.0

NewsSnapshot Enrichment¶

NewsSnapshot.to_news_signal_input() combines raw news text with structured fields for richer parsing:

# Raw API data
news_text = "Hamstring injury - expected back 01 Feb"
status = "i"
chance_next = 25  # percent

# Enriched text fed to NewsSignal
# → "Hamstring injury - expected back 01 Feb. Status: injured. 25% chance of playing"

This gives NewsParser more signal than the raw text alone.

Per-Gameweek Persistence¶

NewsCollector saves snapshots to ~/.fplx/news/gw{NN}.json. This enables backtesting: replay a full season's news week by week to validate the inference pipeline against historical data.

from fplx.data.news_collector import NewsCollector

collector = NewsCollector(cache_dir="~/.fplx/news")

# Collect current state
collector.collect_from_bootstrap(bootstrap_data, gameweek=25)

# Retrieve later (loads from disk)
snapshot = collector.get_player_news(player_id=301, gameweek=25)
flagged = collector.get_players_with_news(gameweek=25)
history = collector.get_player_history(player_id=301)  # all GWs