Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

2026-03-15T13:09:41Z

This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (J Mach Learn Res 24(161): 1--61, 2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation; however, q-learning offers a solution and extends to infinite horizon settings. Finally, I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure. I also conduct simulation experiments to demonstrate how risk-sensitive RL improves the finite-sample performance in the linear-quadratic control problem.

Conditioning on a Volatility Proxy Compresses the Apparent Timescale of Collective Market Correlation

2026-03-14T18:37:40Z

We address the attribution problem for apparent slow collective dynamics: is the observed persistence intrinsic, or inherited from a persistent driver? For the leading eigenvalue fraction $ψ_1=λ_{\max}/N$ of S\&P 500 60-day rolling correlation matrices ($237$ stocks, 2004--2023), a VIX-coupled Ornstein--Uhlenbeck model reduces the effective relaxation time from $298$ to $61$ trading days and improves the fit over bare mean reversion by $Δ$BIC$=109$. On the decomposition sample, an informational residual of $\log(\mathrm{VIX})$ alone retains most of that gain ($Δ$BIC$=78.6$), whereas a mechanical VIX proxy alone does not improve the fit. Autocorrelation-matched placebo fields fail ($Δ$BIC$_{\max}=2.7$), disjoint weekly reconstructions still favor the field-coupled model ($Δ$BIC$=140$--$151$), and six anchored chronological holdouts preserve the out-of-sample advantage. Quiet-regime and field-stripped residual autocorrelation controls show the same collapse of persistence. Stronger hidden-variable extensions remain only partially supported. Within the tested stochastic class, conditioning on the observed VIX proxy absorbs most of the apparent slow dynamics.

Adaptive Strategies for Pension Fund Management

2026-03-14T18:08:14Z

This paper proposes a simulation-based framework for assessing and improving the performance of a pension fund management scheme. This framework is modular and allows the definition of customized performance metrics that are used to assess and iteratively improve asset and liability management policies. We illustrate our framework with a simple implementation that showcases the power of including adaptable features. We show that it is possible to dissipate longevity and volatility risks by permitting adaptability in asset allocation and payout levels. The numerical results show that by including a small amount of flexibility, there can be a substantial reduction in the cost to run the pension plan as well as a substantial decrease in the probability of defaulting.

Performance-Driven Causal Signal Engineering for Financial Markets under Non-Stationarity

2026-03-13T22:44:26Z

We introduce a performance-driven framework for constructing strictly causal forward-oriented observables in strongly non-stationary time series. The method combines a robustly normalized composite of heterogeneous indicators with a causally computed derivative component, yielding a local phase-leading effect that is amplified near regime transitions while remaining fully causal. A hysteresis-based decision functional maps the observable into discrete system states, with execution delayed by one step to preserve strict temporal ordering. Adaptation is achieved through a walk-forward scheme, in which model parameters are selected using rolling train--validation windows and subsequently applied out-of-sample. In this setting, the validation segment acts as an internal performance screen rather than as a statistical validation set, and no claims of generalization are inferred from it alone. The framework is evaluated on high-frequency financial time series as an experimentally accessible realization of a non-stationary complex system. Under a controlled zero-cost setting, the resulting dynamics exhibit a pronounced risk-reshaping effect, characterized by smoother trajectories and reduced drawdowns relative to direct exposure, and should be interpreted as an upper bound on achievable performance. These results illustrate how causal signal engineering can generate anticipatory structure in non-stationary systems without relying on non-causal information, explicit horizon labeling, or high-capacity predictive models.

Feynman-Kac Derivatives Pricing on the Full Forward Curve

2026-03-12T18:52:52Z

This paper introduces a no-arbitrage, Monte Carlo-free approach to pricing path-dependent interest rate derivatives. The Heath-Jarrow-Morton model gives arbitrage-free contingent claims prices but is infinite-dimensional, making traditional numerical methods computationally prohibitive. To make the problem computationally tractable, I cast the stochastic pricing problem as a deterministic partial differential equation (PDE). Finance-Informed Neural Networks (FINNs) solve this PDE directly by minimizing violations of the differential equation and boundary condition, with automatic differentiation efficiently computing the exact derivatives needed to evaluate PDE terms. FINNs achieve pricing accuracy within 0.04 to 0.07 cents per dollar of contract value compared to Monte Carlo benchmarks. Once trained, FINNs price caplets in a few microseconds regardless of dimension, delivering speedups ranging from 300,000 to 4.5 million times faster than Monte Carlo simulation as the state space discretization of the forward curve grows from 10 to 150 nodes. The major Greeks-theta and curve deltas-come for free, computed automatically during PDE evaluation at zero marginal cost, whereas Monte Carlo requires complete re-simulation for each sensitivity. The framework generalizes naturally beyond caplets to other path-dependent derivatives-caps, swaptions, callable bonds-requiring only boundary condition modifications while retaining the same core PDE structure.

A Learnable Wavelet Transformer for Long-Short Equity Trading and Risk-Adjusted Return Optimization

2026-03-12T00:44:46Z

Learning profitable intraday trading policies from financial time series is challenging due to heavy noise, non-stationarity, and strong cross-sectional dependence among related assets. We propose \emph{WaveLSFormer}, a learnable wavelet-based long-short Transformer that jointly performs multi-scale decomposition and return-oriented decision learning. Unlike standard time-series forecasting that optimizes prediction error and typically requires a separate position-sizing or portfolio-construction step, our model directly outputs a market-neutral long/short portfolio and is trained end-to-end on a trading objective with risk-aware regularization. Specifically, a learnable wavelet front-end generates low-/high-frequency components via an end-to-end trained filter bank, guided by spectral regularizers that encourage stable and well-separated frequency bands. To fuse multi-scale information, we introduce a low-guided high-frequency injection (LGHI) module that refines low-frequency representations with high-frequency cues while controlling training stability. The model outputs a portfolio of long/short positions that is rescaled to satisfy a fixed risk budget and is optimized directly with a trading objective and risk-aware regularization. Extensive experiments on five years of hourly data across six industry groups, evaluated over ten random seeds, demonstrate that WaveLSFormer consistently outperforms MLP, LSTM and Transformer backbones, with and without fixed discrete wavelet front-ends. On average in all industries, WaveLSFormer achieves a cumulative overall strategy return of $0.607 \pm 0.045$ and a Sharpe ratio of $2.157 \pm 0.166$, substantially improving both profitability and risk-adjusted returns over the strongest baselines.

Finance-Informed Neural Network: Learning the Geometry of Option Pricing

2026-03-11T21:11:41Z

We propose a Finance-Informed Neural Network (FINN) for option pricing and hedging that integrates financial theory directly into machine learning. Instead of training on observed option prices, FINN is learned through a self-supervised replication objective based on dynamic hedging, ensuring economic consistency by construction. We show theoretically that minimizing replication error recovers the arbitrage-free pricing operator and yields economically meaningful sensitivities. Empirically, FINN accurately recovers classical Black--Scholes prices and performs robustly in stochastic volatility environments, including the Heston model, while remaining stable in settings where analytical solutions are unavailable or unreliable. Fundamental pricing relationships such as put--call parity emerge endogenously. When applied to implied-volatility surface reconstruction, FINN produces surfaces that are consistently closer to observed market-implied volatilities than those obtained from Heston calibrations, indicating superior out-of-sample adaptability and reduced structural bias. Importantly, FINN extends beyond liquid option markets: it can be trained directly on historical spot prices to construct coherent option prices and Greeks for assets with no listed options. More broadly, FINN defines a new paradigm for financial pricing, in which prices are learned from replication and risk-control principles rather than inferred from parametric assumptions or direct supervision on option prices. By reframing option pricing as the learning of a pricing operator rather than the fitting of prices, FINN offers practitioners a practical and scalable tool for pricing, hedging, and risk management across both established and emerging financial markets.

Onflow: a model free, online portfolio allocation algorithm robust to transaction fees

2026-03-11T19:55:11Z

We introduce Onflow, a reinforcement learning method for optimizing portfolio allocation via gradient flows. Our approach dynamically adjusts portfolio allocations to maximize expected log returns while accounting for transaction costs. Using a softmax parameterization, Onflow updates allocations through an ordinary differential equation derived from gradient flow methods. This algorithm belongs to the large class of stochastic optimization procedures; we measure its efficiency by comparing our results to the mathematical theoretical values in a log-normal framework and to standard benchmarks from the 'old NYSE' dataset. For log-normal assets with zero transaction costs, Onflow replicates Markowitz optimal portfolio, achieving the best possible allocation. Numerical experiments from the 'old NYSE' dataset show that Onflow leads to dynamic asset allocation strategies whose performances are: a) comparable to benchmark strategies such as Cover's Universal Portfolio or Helmbold et al. ``multiplicative updates'' approach when transaction costs are zero, and b) better than previous procedures when transaction costs are high. Onflow can even remain efficient in regimes where other dynamical allocation techniques do not work anymore. Onflow is a promising portfolio management strategy that relies solely on observed prices, requiring no assumptions about asset return distributions. This makes it robust against model risk, offering a practical solution for real-world trading strategies.

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

2026-03-11T14:14:13Z

The rapid adoption of large language models (LLMs) in financial services introduces new operational, regulatory, and security risks. Yet most red-teaming benchmarks remain domain-agnostic and fail to capture failure modes specific to regulated BFSI settings, where harmful behavior can be elicited through legally or professionally plausible framing. We propose a risk-aware evaluation framework for LLM security failures in Banking, Financial Services, and Insurance (BFSI), combining a domain-specific taxonomy of financial harms, an automated multi-round red-teaming pipeline, and an ensemble-based judging protocol. We introduce the Risk-Adjusted Harm Score (RAHS), a risk-sensitive metric that goes beyond success rates by quantifying the operational severity of disclosures, accounting for mitigation signals, and leveraging inter-judge agreement. Across diverse models, we find that higher decoding stochasticity and sustained adaptive interaction not only increase jailbreak success, but also drive systematic escalation toward more severe and operationally actionable financial disclosures. These results expose limitations of single-turn, domain-agnostic security evaluation and motivate risk-sensitive assessment under prolonged adversarial pressure for real-world BFSI deployment.

FinReflectKG -- HalluBench: GraphRAG Hallucination Benchmark for Financial Question Answering Systems

2026-03-11T04:37:53Z

As organizations increasingly integrate AI-powered question-answering systems into financial information systems for compliance, risk assessment, and decision support, ensuring the factual accuracy of AI-generated outputs becomes a critical engineering challenge. Current Knowledge Graph (KG)-augmented QA systems lack systematic mechanisms to detect hallucinations - factually incorrect outputs that undermine reliability and user trust. We introduce FinBench-QA-Hallucination, a benchmark for evaluating hallucination detection methods in KG-augmented financial QA over SEC 10-K filings. The dataset contains 755 annotated examples from 300 pages, each labeled for groundedness using a conservative evidence-linkage protocol requiring support from both textual chunks and extracted relational triplets. We evaluate six detection approaches - LLM judges, fine-tuned classifiers, Natural Language Inference (NLI) models, span detectors, and embedding-based methods under two conditions: with and without KG triplets. Results show that LLM-based judges and embedding approaches achieve the highest performance (F1: 0.82-0.86) under clean conditions. However, most methods degrade significantly when noisy triplets are introduced, with Matthews Correlation Coefficient (MCC) dropping 44-84 percent, while embedding methods remain relatively robust with only 9 percent degradation. Statistical tests (Cochran's Q and McNemar) confirm significant performance differences (p < 0.001). Our findings highlight vulnerabilities in current KG-augmented systems and provide insights for building reliable financial information systems, where hallucinations can lead to regulatory violations and flawed decisions. The benchmark also offers a framework for integrating AI reliability evaluation into information system design across other high-stakes domains such as healthcare, legal, and government.

Uncertainty-Aware Deep Hedging

2026-03-10T18:17:51Z

Deep hedging trains neural networks to manage derivative risk under market frictions, but produces hedge ratios with no measure of model confidence -- a significant barrier to deployment. We introduce uncertainty quantification to the deep hedging framework by training a deep ensemble of five independent LSTM networks under Heston stochastic volatility with proportional transaction costs. The ensemble's disagreement at each time step provides a per-time-step confidence measure that is strongly predictive of hedging performance: the learned strategy outperforms the Black-Scholes delta on approximately 80% of paths when model agreement is high, but on fewer than 20% when disagreement is elevated. We propose a CVaR-optimised blending strategy that combines the ensemble's hedge with the classical Black-Scholes delta, weighted by the level of model uncertainty. The blend improves on the Black-Scholes delta by 35-80 basis points in CVaR across several Heston calibrations, and on the theoretically optimal Whalley-Wilmott strategy by 100-250 basis points, with all improvements statistically significant under paired bootstrap tests. The analysis reveals that ensemble uncertainty is driven primarily by option moneyness rather than volatility, and that the uncertainty-performance relationship inverts under weak leverage -- findings with practical implications for the deployment of machine learning in hedging systems.

AlphaLogics: A Market Logic-Driven Multi-Agent System for Scalable and Interpretable Alpha Factor Generation

2026-03-10T12:18:02Z

Factor investing is ultimately grounded in market logic - the latent mechanism behind observed alpha factors that explains why they should persist across assets and regimes. However, recent factor mining prioritizes factor discovery over logic discovery, producing complex alpha factors with unclear rationale, while market logic remains largely handcrafted and difficult to scale. To address this challenge, we propose AlphaLogics, a market logic-driven multi-agent system for factor mining. AlphaLogics consists of three key components: (i) Market Logic Mining: reverse-extracting market logic from historical factor libraries to construct an initial market logic library; (ii) Factor Generation and Optimization: using new market logics generated in (i) to guide factor generation, and optimizing factors with backtesting feedback; and (iii) Market Logic Generation and Optimization: generating new market logics conditioned on the initial market logic library, and refining each market logic by aggregating the backtest outcomes of its guided factors, continuously refreshing the library. Experiments on CSI 500 and S&P 500 show that AlphaLogics consistently improves predictive metrics and risk-adjusted returns over representative baselines, while producing a market logic library that remains empirically useful for guiding further factor discovery.

On an Optimal Stopping Problem with a Discontinuous Reward

2026-03-09T14:30:05Z

We study an optimal stopping problem with an unbounded, time-dependent and discontinuous reward function. This problem is motivated by the pricing of a variable annuity contract with guaranteed minimum maturity benefit, under the assumption that the policyholder's surrender behaviour maximizes the risk-neutral value of the contract. We consider a general fee and surrender charge function, and give a condition under which optimal stopping always occurs at maturity. Using an alternative representation for the value function of the optimization problem, we study its analytical properties and the resulting surrender (or exercise) region. In particular, we show that the non-emptiness and the shape of the surrender region are fully characterized by the fee and the surrender charge functions, which provides a powerful tool to understand their interrelation and how it affects early surrenders and the optimal surrender boundary. Under certain conditions on these two functions, we develop three representations for the value function; two are analogous to their American option counterpart, and one is new to the actuarial and American option pricing literature.

The Martingale Sinkhorn Algorithm

2026-03-09T10:03:11Z

We develop a numerical method for the martingale analogue of the Benamou--Brenier optimal transport problem, which seeks a martingale interpolating two prescribed marginals which is closest to the Brownian motion. Recent contributions have established existence of the optimal martingale under finite second moment assumptions on the marginals, but numerical methods exist only in the one-dimensional setting. We introduce an iterative scheme, a martingale analogue of the celebrated Sinkhorn algorithm, and prove that it yields a Bass potential in arbitrary dimension under minimal assumptions. In particular, we show that this holds when the marginals have finite moments of order $p > 1$, thereby extending the known theory beyond the finite-second-moment regime. The proof relies on a strict descent property for the dual value of the martingale Benamou--Brenier problem. While the descent property admits a direct verification in the case of compactly supported marginals, obtaining uniform control on the iterates without assuming compact support is substantially more delicate and constitutes the main technical challenge.

Enhanced indexation using both equity assets and index options

2026-03-09T00:09:48Z

In this paper we consider how we can include index options in enhanced indexation. We present the concept of an \enquote{option strategy} which enables us to treat options as an artificial asset. An option strategy for a known set of options is a specified set of rules which detail how these options are to be traded (i.e.~bought, rolled over, sold) depending upon market conditions. We consider option strategies in the context of enhanced indexation, but we discuss how they have much wider applicability in terms of portfolio optimisation. We use an enhanced indexation approach based on second-order stochastic dominance. We consider index options for the S\&P~500, using a dataset of daily stock prices over the period 2017-2025 that has been manually adjusted to account for survivorship bias. This dataset is made publicly available for use by future researchers. Our computational results indicate that introducing option strategies in an enhanced indexation setting offers clear benefits in terms of improved out-of-sample performance. This applies whether we use equities or an exchange-traded fund as part of the enhanced indexation portfolio.