Skip to content

rolling_cv

rolling_cv

Rolling cross-validation for time-series models.

RollingCV

RollingCV(
    initial_train_size: int, test_size: int, step: int = 1
)

Generates indices for rolling cross-validation splits.

This is adapted from the MLSP project for time-series validation.

PARAMETER DESCRIPTION
initial_train_size

Size of the initial training set.

TYPE: int

test_size

Size of the test set (forecast horizon).

TYPE: int

step

Step size to move the training window forward.

TYPE: int DEFAULT: 1

Source code in fplx/models/rolling_cv.py
def __init__(self, initial_train_size: int, test_size: int, step: int = 1):
    if initial_train_size <= 0 or test_size <= 0 or step <= 0:
        raise ValueError(
            "initial_train_size, test_size, and step must be positive integers."
        )
    self.initial_train_size = initial_train_size
    self.test_size = test_size
    self.step = step

split

split(X) -> Generator[tuple[ndarray, ndarray], None, None]

Generate indices to split data into training and test sets.

PARAMETER DESCRIPTION
X

Time series data.

TYPE: array - like

YIELDS DESCRIPTION
train_indices

The training set indices for that split.

TYPE:: ndarray

test_indices

The testing set indices for that split.

TYPE:: ndarray

Source code in fplx/models/rolling_cv.py
def split(self, X) -> Generator[tuple[np.ndarray, np.ndarray], None, None]:
    """
    Generate indices to split data into training and test sets.

    Parameters
    ----------
    X : array-like
        Time series data.

    Yields
    ------
    train_indices : np.ndarray
        The training set indices for that split.
    test_indices : np.ndarray
        The testing set indices for that split.
    """
    n_samples = len(X)
    if self.initial_train_size + self.test_size > n_samples:
        raise ValueError(
            "initial_train_size + test_size is larger than the number of samples."
        )

    train_start = 0
    while train_start + self.initial_train_size + self.test_size <= n_samples:
        train_end = train_start + self.initial_train_size
        test_end = train_end + self.test_size

        train_indices = np.arange(train_start, train_end)
        test_indices = np.arange(train_end, test_end)

        yield train_indices, test_indices

        train_start += self.step