https://arxiv.org/api/u/lWBi30jXcT4XpnHf6oLKCGpuM 2026-03-18T08:45:58Z 3120 60 15 http://arxiv.org/abs/2408.11773v2 Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning 2026-02-13T15:27:13Z The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases. 2024-08-21T16:54:53Z Fabrizio Lillo Andrea Macrì http://arxiv.org/abs/2602.12770v1 Efficient Monte Carlo Valuation of Corporate Bonds in Financial Networks 2026-02-13T09:55:11Z Valuing corporate bonds in systemic economies is challenging due to intricate webs of inter-institutional exposures. When a bank defaults, cascading losses propagate through the network, with payments determined by a system of fixed-point equations lacking closed-form solutions. Standard Monte Carlo methods cannot capture rare yet critical default events, while existing rare-event simulation techniques fail to account for higher-order network effects and scale poorly with network size. To overcome these challenges, we propose a novel approach -- Bi-Level Importance Sampling with Splitting -- and characterize individual bank defaults by decoupling them from the network's complex fixed-point dynamics. This separation enables a two-stage estimation process that directly generates samples from the banks' default events. We demonstrate theoretically that the method is both scalable and asymptotically optimal, and validate its effectiveness through numerical studies on empirically observed networks. 2026-02-13T09:55:11Z Dohyun Ahn Agostino Capponi http://arxiv.org/abs/2602.12030v1 Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning 2026-02-12T15:00:28Z In finance, sequential decision problems are often faced, for which reinforcement learning (RL) emerges as a promising tool for optimisation without the need of analytical tractability. However, the objective of classical RL is the expected cumulated reward, while financial applications typically require a trade-off between return and risk. In this work, we focus on settings where one cares about the time split of the total return, ruling out most risk-aware generalisations of RL which optimise a risk measure defined on the latter. We notice that a preference for homogeneous splits, which we found satisfactory for hedging, can be unfit for other problems, and therefore propose a new risk metric which still penalises uncertainty of the single rewards, but allows for an arbitrary planning of their target levels. We study the properties of the resulting objective and the generalisation of learning algorithms to optimise it. Finally, we show numerical results on toy examples. 2026-02-12T15:00:28Z 18 pages, 6 figures Federico Cacciamani Roberto Daluiso Marco Pinciroli Michele Trapletti Edoardo Vittori http://arxiv.org/abs/2602.10071v1 Deep Learning for Electricity Price Forecasting: A Review of Day-Ahead, Intraday, and Balancing Electricity Markets 2026-02-10T18:36:36Z Electricity price forecasting (EPF) plays a critical role in power system operation and market decision making. While existing review studies have provided valuable insights into forecasting horizons, market mechanisms, and evaluation practices, the rapid adoption of deep learning has introduced increasingly diverse model architectures, output structures, and training objectives that remain insufficiently analyzed in depth. This paper presents a structured review of deep learning methods for EPF in day-ahead, intraday, and balancing markets. Specifically, We introduce a unified taxonomy that decomposes deep learning models into backbone, head, and loss components, providing a consistent evaluation perspective across studies. Using this framework, we analyze recent trends in deep learning components across markets. Our study highlights the shift toward probabilistic, microstructure-centric, and market-aware designs. We further identify key gaps in the literature, including limited attention to intraday and balancing markets and the need for market-specific modeling strategies, thereby helping to consolidate and advance existing review studies. 2026-02-10T18:36:36Z 9 pages, 2 figures, 2 tables Runyao Yu Derek W. Bunn Julia Lin Jochen Stiasny Fabian Leimgruber Tara Esterl Yuchen Tao Lianlian Qi Yujie Chen Wentao Wang Jochen L. Cremer http://arxiv.org/abs/2602.09950v1 How can the dual martingale help solving the primal optimal stopping problem? 2026-02-10T16:35:28Z Motivated by recent results on the dual formulation of optimal stopping problems, we investigate in this short paper how the knowledge of an approximating dual martingale can improve the efficiency of primal methods. In particular, we show on numerical examples that accurate approximations of a dual martingale efficiently reduce the variance for the primal optimal stopping problem. 2026-02-10T16:35:28Z Aurélien Alfonsi Ahmed Kebaier Jérôme Lelong http://arxiv.org/abs/2602.08182v1 Nansde-net: A neural sde framework for generating time series with memory 2026-02-09T00:53:28Z Modeling time series with long- or short-memory characteristics is a fundamental challenge in many scientific and engineering domains. While fractional Brownian motion has been widely used as a noise source to capture such memory effects, its incompatibility with Itô calculus limits its applicability in neural stochastic differential equation~(SDE) frameworks. In this paper, we propose a novel class of noise, termed Neural Network-kernel ARMA-type noise~(NA-noise), which is an Itô-process-based alternative capable of capturing both long- and short-memory behaviors. The kernel function defining the noise structure is parameterized via neural networks and decomposed into a product form to preserve the Markov property. Based on this noise process, we develop NANSDE-Net, a generative model that extends Neural SDEs by incorporating NA-noise. We prove the theoretical existence and uniqueness of the solution under mild conditions and derive an efficient backpropagation scheme for training. Empirical results on both synthetic and real-world datasets demonstrate that NANSDE-Net matches or outperforms existing models, including fractional SDE-Net, in reproducing long- and short-memory features of the data, while maintaining computational tractability within the Itô calculus framework. 2026-02-09T00:53:28Z PAKDD2026 Accepted Hiromu Ozai Kei Nakagawa http://arxiv.org/abs/2501.15106v2 Solving Optimal Execution Problems via In-Context Operator Networks 2026-02-07T04:21:48Z We propose a novel transformer-based neural network architecture (ICON-OCnet) for solving optimal order execution problems in the presence of unknown price impact. Our architecture facilitates data-driven in-context operator learning for the incurred price impact by merging offline pre-training with online few-shot prompting inference. First, the operator learning component (ICON) learns the prevailing price impact environment from only a few executed trade and price impact trajectories (time series data) provided as context. Second, we employ ICON as a surrogate operator to train a neural network policy (OCnet) for the optimal order execution strategy for the price impact regime inferred from the in-context examples. We study the efficiency of our approach for linear propagator models with path-dependent transient price impact and explicitly known optimal execution strategies. In this model class, price impact persists and decays over time according to some propagator kernel. We illustrate that ICON is capable of accurately inferring the underlying price impact model from the data prompts, even for propagator kernels not seen in the training data. Moreover, we demonstrate that ICON-OCnet correctly retrieves the exact optimal order execution strategy for the model generating the in-context examples. Our introduced methodology is very general, offering a new approach to solving path-dependent optimal stochastic control problems sample-based with unknown state dynamics. 2025-01-25T07:15:47Z 27 pages, 11 figures Tingwei Meng Moritz Voß Nils Detering Giulio Farolfi Stanley Osher Georg Menz http://arxiv.org/abs/2403.02572v2 Fill Probabilities in a Limit Order Book with State-Dependent Stochastic Order Flows 2026-02-06T15:11:44Z This paper studies the fill probabilities of limit orders placed at different price levels in a limit order book. These probabilities play a central role in execution optimization, as limit orders are not guaranteed to be executed and inherently involve a trade-off between execution cost and execution risk. We model the limit order book within a general state-dependent stochastic framework, representing its dynamics as a collection of interacting queuing systems while incorporating key stylized market features. Within this framework, we derive semi-analytical expressions for several quantities of interest under state-dependent order flows, including the probability of a mid-price change, the fill probabilities of orders placed at the best quotes, and those of orders placed deeper in the book before the opposite best quote moves. While the framework can be extended to even deeper price levels, the corresponding fill probabilities are typically negligible. We validate the proposed model through extensive numerical experiments using real foreign exchange spot market data. The results demonstrate that the model remains tractable while capturing essential order book dynamics, and that the derived expressions achieve good accuracy in estimating fill probabilities. 2024-03-05T01:04:45Z Felix Lokin Fenghui Yu http://arxiv.org/abs/2602.07096v1 RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid? 2026-02-06T13:47:54Z Reliable financial reasoning requires knowing not only how to answer, but also when an answer cannot be justified. In real financial practice, problems often rely on implicit assumptions that are taken for granted rather than stated explicitly, causing problems to appear solvable while lacking enough information for a definite answer. We introduce REALFIN, a bilingual benchmark that evaluates financial reasoning by systematically removing essential premises from exam-style questions while keeping them linguistically plausible. Based on this, we evaluate models under three formulations that test answering, recognizing missing information, and rejecting unjustified options, and find consistent performance drops when key conditions are absent. General-purpose models tend to over-commit and guess, while most finance-specialized models fail to clearly identify missing premises. These results highlight a critical gap in current evaluations and show that reliable financial models must know when a question should not be answered. 2026-02-06T13:47:54Z Yuyang Dai Yan Lin Zhuohan Xie Yuxia Wang http://arxiv.org/abs/2602.07085v1 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining 2026-02-06T08:08:04Z Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated experience. To address these challenges, we propose QuantaAlpha, an evolutionary alpha mining framework that treats each end-to-end mining run as a trajectory and improves factors through trajectory-level mutation and crossover operations. QuantaAlpha localizes suboptimal steps in each trajectory for targeted revision and recombines complementary high-reward segments to reuse effective patterns, enabling structured exploration and refinement across mining iterations. During factor generation, QuantaAlpha enforces semantic consistency across the hypothesis, factor expression, and executable code, while constraining the complexity and redundancy of the generated factor to mitigate crowding. Extensive experiments on the China Securities Index 300 (CSI 300) demonstrate consistent gains over strong baseline models and prior agentic systems. When utilizing GPT-5.2, QuantaAlpha achieves an Information Coefficient (IC) of 0.1501, with an Annualized Rate of Return (ARR) of 27.75% and a Maximum Drawdown (MDD) of 7.98%. Moreover, factors mined on CSI 300 transfer effectively to the China Securities Index 500 (CSI 500) and the Standard & Poor's 500 Index (S&P 500), delivering 160% and 137% cumulative excess return over four years, respectively, which indicates strong robustness of QuantaAlpha under market distribution shifts. 2026-02-06T08:08:04Z Jun Han Shuo Zhang Wei Li Zhi Yang Yifan Dong Tu Hu Jialuo Yuan Xiaomin Yu Yumo Zhu Fangqi Lou Xin Guo Zhaowei Liu Tianyi Jiang Ruichuan An Jingping Liu Biao Wu Rongze Chen Kunyi Wang Yifan Wang Sen Hu Xinbing Kong Liwen Zhang Ronghao Chen Huacan Wang http://arxiv.org/abs/2602.06394v1 Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization 2026-02-06T05:26:59Z Current tokenization methods process sequential data without accounting for signal quality, limiting their effectiveness on noisy real-world corpora. We present QA-Token (Quality-Aware Tokenization), which incorporates data reliability directly into vocabulary construction. We make three key contributions: (i) a bilevel optimization formulation that jointly optimizes vocabulary construction and downstream performance, (ii) a reinforcement learning approach that learns merge policies through quality-aware rewards with convergence guarantees, and (iii) an adaptive parameter learning mechanism via Gumbel-Softmax relaxation for end-to-end optimization. Our experimental evaluation demonstrates consistent improvements: genomics (6.7 percentage point F1 gain in variant calling over BPE), finance (30% Sharpe ratio improvement). At foundation scale, we tokenize a pretraining corpus comprising 1.7 trillion base-pairs and achieve state-of-the-art pathogen detection (94.53 MCC) while reducing token count by 15%. We unlock noisy real-world corpora, spanning petabases of genomic sequences and terabytes of financial time series, for foundation model training with zero inference overhead. 2026-02-06T05:26:59Z Arvid E. Gollwitzer Paridhi Latawa David de Gruijl Deepak A. Subramanian Adrián Noriega de la Colina http://arxiv.org/abs/2511.01587v2 Numerical methods for solving PIDEs arising in swing option pricing under a two-factor mean-reverting model with jumps 2026-02-04T08:45:11Z This paper concerns the numerical valuation of swing options with discrete action times under a linear two-factor mean-reverting model with jumps. The resulting sequence of two-dimensional partial integro-differential equations (PIDEs) are convection-dominated and possess a nonlocal integral term due to the presence of jumps. Further, the initial function is nonsmooth. We propose various second-order numerical methods that can adequately handle these challenging features. The stability and convergence of these numerical methods are analysed theoretically. By ample numerical experiments, we confirm their second-order convergence behaviour. 2025-11-03T13:56:46Z Mustapha Regragui Karel J. in 't Hout Michèle Vanmaele Fred Espen Benth http://arxiv.org/abs/2602.03776v1 DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order Books 2026-02-03T17:34:56Z Modern generative models for limit order books (LOBs) can reproduce realistic market dynamics, but remain fundamentally passive: they either model what typically happens without accounting for hypothetical future market conditions, or they require interaction with another agent to explore alternative outcomes. This limits their usefulness for stress testing, scenario analysis, and decision-making. We propose \textbf{DiffLOB}, a regime-conditioned \textbf{Diff}usion model for controllable and counterfactual generation of \textbf{LOB} trajectories. DiffLOB explicitly conditions the generative process on future market regimes--including trend, volatility, liquidity, and order-flow imbalance, which enables the model to answer counterfactual queries of the form: ``If the future market regime were X instead of Y, how would the limit order book evolve?'' Our systematic evaluation framework for counterfactual LOB generation consists of three criteria: (1) \textit{Controllable Realism}, measuring how well generated trajectories can reproduce marginal distributions, temporal dependence structure and regime variables; (2) \textit{Counterfactual validity}, testing whether interventions on future regimes induce consistent changes in the generated LOB dynamics; (3) \textit{Counterfactual usefulness}, assessing whether synthetic counterfactual trajectories improve downstream prediction of future market regimes. 2026-02-03T17:34:56Z 12 pages, 8 figures Zhuohan Wang Carmine Ventre http://arxiv.org/abs/2602.03725v1 Quantum Speedups for Derivative Pricing Beyond Black-Scholes 2026-02-03T16:45:24Z This paper explores advancements in quantum algorithms for derivative pricing of exotics, a computational pipeline of fundamental importance in quantitative finance. For such cases, the classical Monte Carlo integration procedure provides the state-of-the-art provable, asymptotic performance: polynomial in problem dimension and quadratic in inverse-precision. While quantum algorithms are known to offer quadratic speedups over classical Monte Carlo methods, end-to-end speedups have been proven only in the simplified setting over the Black-Scholes geometric Brownian motion (GBM) model. This paper extends existing frameworks to demonstrate novel quadratic speedups for more practical models, such as the Cox-Ingersoll-Ross (CIR) model and a variant of Heston's stochastic volatility model, utilizing a characteristic of the underlying SDEs which we term fast-forwardability. Additionally, for general models that do not possess the fast-forwardable property, we introduce a quantum Milstein sampler, based on a novel quantum algorithm for sampling Lévy areas, which enables quantum multi-level Monte Carlo to achieve quadratic speedups for multi-dimensional stochastic processes exhibiting certain correlation types. We also present an improved analysis of numerical integration for derivative pricing, leading to substantial reductions in the resource requirements for pricing GBM and CIR models. Furthermore, we investigate the potential for additional reductions using arithmetic-free quantum procedures. Finally, we critique quantum partial differential equation (PDE) solvers as a method for derivative pricing based on amplitude estimation, identifying theoretical barriers that obstruct achieving a quantum speedup through this approach. Our findings significantly advance the understanding of quantum algorithms in derivative pricing, addressing key challenges and open questions in the field. 2026-02-03T16:45:24Z Dylan Herman Yue Sun Jin-Peng Liu Marco Pistoia Charlie Che Rob Otter Shouvanik Chakrabarti Aram Harrow http://arxiv.org/abs/2602.03461v1 Soft-Radial Projection for Constrained End-to-End Learning 2026-02-03T12:33:44Z Integrating hard constraints into deep learning is essential for safety-critical systems. Yet existing constructive layers that project predictions onto constraint boundaries face a fundamental bottleneck: gradient saturation. By collapsing exterior points onto lower-dimensional surfaces, standard orthogonal projections induce rank-deficient Jacobians, which nullify gradients orthogonal to active constraints and hinder optimization. We introduce Soft-Radial Projection, a differentiable reparameterization layer that circumvents this issue through a radial mapping from Euclidean space into the interior of the feasible set. This construction guarantees strict feasibility while preserving a full-rank Jacobian almost everywhere, thereby preventing the optimization stalls typical of boundary-based methods. We theoretically prove that the architecture retains the universal approximation property and empirically show improved convergence behavior and solution quality over state-of-the-art optimization- and projection-based baselines. 2026-02-03T12:33:44Z Philipp J. Schneider Daniel Kuhn