findings

Eight findings from building a regime-aware execution system that validates and extends the CTMSTOU reinforcement learning paper.

All findings are reproducible from the live dashboard · Statistical tests include paired t-test, 1000-iteration permutation test, binomial test · Bonferroni corrected

01

RL fails to exploit regime information

PPO agents trained on the CTMSTOU simulation cannot reliably exploit market regime labels even when explicitly provided in their state space. Training converges to qualitatively different policies across random seeds, sometimes producing inverted regime sensitivity, executing more aggressively in bear markets than bull. Hard-coded regime conditioning bypasses the optimization problem that standard policy gradient methods cannot solve reliably.

02

Crash regime requires completely different execution

4-state HMM separates crash from bearish, a conflation the original paper's 2-state simulation couldn't expose. Crash vol is 1.3–2× higher than bearish vol, requiring halt-and-wait execution rather than patient limit orders. Treating them as a single 'bad' state loses the execution-critical distinction.

1.3–2× vol ratio
03

High-confidence signals outperform; transition zones do not

~23% of trading days fall in low-confidence transition zones where regime-conditional execution shows no statistically significant edge over TWAP. Filtering to high-confidence signals lifts performance. Knowing when not to use the model is as important as building it, a finding that mirrors the RL paper's core insight about state uncertainty. Note: Short periods (3mo/6mo) during low-volatility markets may show 100% transitional classification, the model correctly identifies regime uncertainty rather than forcing a label.

23% transition days
04

Signal significance confirmed by permutation testing

1000 permutations of shuffled regime labels produce an empirical null distribution. Observed cost savings sit above this null, confirming the regime signal carries real execution information rather than overfitting to historical data. Bonferroni correction applied across 3 simultaneous hypothesis tests.

1000 permutations
05

Regime-conditional GARCH reduces volatility forecast error

Separate GARCH(1,1) fitted per regime state produces conditional 5-day volatility forecasts that outperform unconditional full-series GARCH on RMSE against 21-day rolling realised vol windows. Regimes contain forward-looking volatility information not captured by unconditional models.

RMSE reduction vs baseline
06

Bayesian changepoint detection leads HMM Viterbi by 1.5 days

Ruptures PELT algorithm detects structural breaks in the return series. Compared against HMM Viterbi path smoothing, changepoints are identified a median of 1.5 days earlier, with 56.2% of regime transitions detected in advance. Earlier detection enables pre-emptive execution strategy adjustment before the HMM confirms a switch.

+1.5 days median lead
07

Macro fundamentals confirm statistical regimes

FRED macroeconomic data (unemployment, Fed funds rate, CPI, yield curve, VIX, industrial production) mapped to HMM regime states via logistic regression and correlation analysis. VIX shows ρ=−0.745 with regime severity. Crash regimes average VIX 24.62 vs 10.32 in bullish, a 2.4× difference. Statistical regimes align with real economic fundamentals.

ρ = −0.745 (VIX)
08

BIC covariance selection is asset-dependent

Model selection between full and diagonal covariance HMM via BIC produces different winners across asset classes. The optimal covariance structure for equities differs from crypto and bonds, a directly observable, citable finding that contradicts the assumption of a universal regime model structure.

asset-dependent BIC

CITATIONS & EXTENSIONS

Primary paper

Optimal Execution with Regime Switching

CTMSTOU framework

RL failure finding directly motivated by this work

Execution model

Optimal Execution of Portfolio Transactions

Almgren & Chriss (2000)

Market impact model extended with regime-dependent parameters

Volatility

ARCH/GARCH models for financial time series

Engle (1982), Bollerslev (1986)

Regime-conditional extension: separate GARCH(1,1) per state

Changepoint

Optimal detection of changepoints with a linear computational cost

Killick, Fearnhead & Eckley (2012)

PELT algorithm implemented via ruptures library

Explainability

A unified approach to interpreting model predictions

Lundberg & Lee (2017)

SHAP KernelExplainer on HMM posterior probabilities

view live dashboard →