https://arxiv.org/api/gUZSpwa5K5VUyttd4t1ecxF8HaU2026-03-26T17:17:27Z217113515http://arxiv.org/abs/2511.02518v1Option market making with hedging-induced market impact2025-11-04T12:13:44ZThis paper develops a model for option market making in which the hedging activity of the market maker generates price impact on the underlying asset. The option order flow is modeled by Cox processes, with intensities depending on the state of the underlying and on the market maker's quoted prices. The resulting dynamics combine stochastic option demand with both permanent and transient impact on the underlying, leading to a coupled evolution of inventory and price. We first study market manipulation and arbitrage phenomena that may arise from the feedback between option trading and underlying impact. We then establish the well-posedness of the mixed control problem, which involves continuous quoting decisions and impulsive hedging actions. Finally, we implement a numerical method based on policy optimization to approximate optimal strategies and illustrate the interplay between option market liquidity, inventory risk, and underlying impact.2025-11-04T12:13:44ZPaulin AubertEtienne ChevalierVathana Ly Vathhttp://arxiv.org/abs/2511.02136v1JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading2025-11-03T23:56:15ZAgent-based modelling (ABM) approaches for high-frequency financial markets are difficult to calibrate and validate, partly due to the large parameter space created by defining fixed agent policies. Multi-agent reinforcement learning (MARL) enables more realistic agent behaviour and reduces the number of free parameters, but the heavy computational cost has so far limited research efforts. To address this, we introduce JaxMARL-HFT (JAX-based Multi-Agent Reinforcement Learning for High-Frequency Trading), the first GPU-accelerated open-source multi-agent reinforcement learning environment for high-frequency trading (HFT) on market-by-order (MBO) data. Extending the JaxMARL framework and building on the JAX-LOB implementation, JaxMARL-HFT is designed to handle a heterogeneous set of agents, enabling diverse observation/action spaces and reward functions. It is designed flexibly, so it can also be used for single-agent RL, or extended to act as an ABM with fixed-policy agents. Leveraging JAX enables up to a 240x reduction in end-to-end training time, compared with state-of-the-art reference implementations on the same hardware. This significant speed-up makes it feasible to exploit the large, granular datasets available in high-frequency trading, and to perform the extensive hyperparameter sweeps required for robust and efficient MARL research in trading. We demonstrate the use of JaxMARL-HFT with independent Proximal Policy Optimization (IPPO) for a two-player environment, with an order execution and a market making agent, using one year of LOB data (400 million orders), and show that these agents learn to outperform standard benchmarks. The code for the JaxMARL-HFT framework is available on GitHub.2025-11-03T23:56:15ZCode available at: https://github.com/vmohl/JaxMARL-HFT6th ACM International Conference on AI in Finance (ICAIF '25), November 15-18, 2025, Singapore, Singapore. ACM, New York, NY, USA, 9 pagesValentin MohlSascha FreyReuben LeylandKang LiGeorge NigmatulinMihai CucuringuStefan ZohrenJakob FoersterAnisoara Calinescuhttp://arxiv.org/abs/2511.02016v1ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book2025-11-03T19:42:17ZWe present ABIDES-MARL, a framework that combines a new multi-agent reinforcement learning (MARL) methodology with a new realistic limit-order-book (LOB) simulation system to study equilibrium behavior in complex financial market games. The system extends ABIDES-Gym by decoupling state collection from kernel interruption, enabling synchronized learning and decision-making for multiple adaptive agents while maintaining compatibility with standard RL libraries. It preserves key market features such as price-time priority and discrete tick sizes. Methodologically, we use MARL to approximate equilibrium-like behavior in multi-period trading games with a finite number of heterogeneous agents-an informed trader, a liquidity trader, noise traders, and competing market makers-all with individual price impacts. This setting bridges optimal execution and market microstructure by embedding the liquidity trader's optimization problem within a strategic trading environment. We validate the approach by solving an extended Kyle model within the simulation system, recovering the gradual price discovery phenomenon. We then extend the analysis to a liquidity trader's problem where market liquidity arises endogenously and show that, at equilibrium, execution strategies shape market-maker behavior and price dynamics. ABIDES-MARL provides a reproducible foundation for analyzing equilibrium and strategic adaptation in realistic markets and contributes toward building economically interpretable agentic AI systems for finance.2025-11-03T19:42:17ZPatrick CheriditoJean-Loup DupretZhexin Wuhttp://arxiv.org/abs/2511.01471v1Trade Execution Flow as the Underlying Source of Market Dynamics2025-11-03T11:30:59ZIn this work, we demonstrate experimentally that the execution flow, $I = dV/dt$, is the fundamental driving force of market dynamics. We develop a numerical framework to calculate execution flow from sampled moments using the Radon-Nikodym derivative. A notable feature of this approach is its ability to automatically determine thresholds that can serve as actionable triggers. The technique also determines the characteristic time scale directly from the corresponding eigenproblem. The methodology has been validated on actual market data to support these findings. Additionally, we introduce a framework based on the Christoffel function spectrum, which is invariant under arbitrary non-degenerate linear transformations of input attributes and offers an alternative to traditional principal component analysis (PCA), which is limited to unitary invariance.2025-11-03T11:30:59ZMikhail Gennadievich BelovVictor Victorovich DubovVadim Konstantinovich IvanovAlexander Yurievich MaslovOlga Vladimirovna ProshinaVladislav Gennadievich Malyshkinhttp://arxiv.org/abs/2411.06389v2Optimal Execution with Reinforcement Learning2025-11-01T19:34:00ZThis study investigates the development of an optimal execution strategy through reinforcement learning, aiming to determine the most effective approach for traders to buy and sell inventory within a finite time horizon. Our proposed model leverages input features derived from the current state of the limit order book and operates at a high frequency to maximize control. To simulate this environment and overcome the limitations associated with relying on historical data, we utilize the multi-agent market simulator ABIDES, which provides a diverse range of depth levels within the limit order book. We present a custom MDP formulation followed by the results of our methodology and benchmark the performance against standard execution strategies. Results show that the reinforcement learning agent outperforms standard strategies and offers a practical foundation for real-world trading applications.2024-11-10T08:21:03Z8 pagesYadh HafsiEdoardo Vittorihttp://arxiv.org/abs/2511.00190v1Deep reinforcement learning for optimal trading with partial information2025-10-31T18:48:59ZReinforcement Learning (RL) applied to financial problems has been the subject of a lively area of research. The use of RL for optimal trading strategies that exploit latent information in the market is, to the best of our knowledge, not widely tackled. In this paper we study an optimal trading problem, where a trading signal follows an Ornstein-Uhlenbeck process with regime-switching dynamics. We employ a blend of RL and Recurrent Neural Networks (RNN) in order to make the most at extracting underlying information from the trading signal with latent parameters.
The latent parameters driving mean reversion, speed, and volatility are filtered from observations of the signal, and trading strategies are derived via RL. To address this problem, we propose three Deep Deterministic Policy Gradient (DDPG)-based algorithms that integrate Gated Recurrent Unit (GRU) networks to capture temporal dependencies in the signal. The first, a one -step approach (hid-DDPG), directly encodes hidden states from the GRU into the RL trader. The second and third are two-step methods: one (prob-DDPG) makes use of posterior regime probability estimates, while the other (reg-DDPG) relies on forecasts of the next signal value. Through extensive simulations with increasingly complex Markovian regime dynamics for the trading signal's parameters, as well as an empirical application to equity pair trading, we find that prob-DDPG achieves superior cumulative rewards and exhibits more interpretable strategies. By contrast, reg-DDPG provides limited benefits, while hid-DDPG offers intermediate performance with less interpretable strategies. Our results show that the quality and structure of the information supplied to the agent are crucial: embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies.2025-10-31T18:48:59ZAndrea MacrìSebastian JaimungalFabrizio Lillohttp://arxiv.org/abs/2510.26438v2An Impulse Control Approach to Market Making in a Hawkes LOB Market2025-10-31T13:35:38ZWe study the optimal Market Making problem in a Limit Order Book (LOB) market simulated using a high-fidelity, mutually exciting Hawkes process. Departing from traditional Brownian-driven mid-price models, our setup captures key microstructural properties such as queue dynamics, inter-arrival clustering, and endogenous price impact. Recognizing the realistic constraint that market makers cannot update strategies at every LOB event, we formulate the control problem within an impulse control framework, where interventions occur discretely via limit, cancel, or market orders. This leads to a high-dimensional, non-local Hamilton-Jacobi-Bellman Quasi-Variational Inequality (HJB-QVI), whose solution is analytically intractable and computationally expensive due to the curse of dimensionality. To address this, we propose a novel Reinforcement Learning (RL) approximation inspired by auxiliary control formulations. Using a two-network PPO-based architecture with self-imitation learning, we demonstrate strong empirical performance with limited training, achieving Sharpe ratios above 30 in a realistic simulated LOB. In addition to that, we solve the HJB-QVI using a deep learning method inspired by Sirignano and Spiliopoulos 2018 and compare the performance with the RL agent. Our findings highlight the promise of combining impulse control theory with modern deep RL to tackle optimal execution problems in jump-driven microstructural markets.2025-10-30T12:34:06ZKonark JainNick FiroozyeJonathan KochemsPhilip Treleavenhttp://arxiv.org/abs/2510.27334v1When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making2025-10-31T10:05:14ZWe investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a medium frequency trader (MFT) executing a meta-order and demonstrate that, with training against the MFT meta-order execution agent, the RL market making agent learns to capitalise on the price drift induced by the meta-order. Recent empirical studies have shown that medium-frequency traders are increasingly subject to adverse selection by high-frequency trading agents. As high-frequency trading continues to proliferate across financial markets, the slippage costs incurred by medium-frequency traders are likely to increase over time. However, we do not observe that increased profits for the market making RL agent necessarily cause significantly increased slippages for the MFT agent.2025-10-31T10:05:14ZAli Raza JafreeKonark JainNick Firoozyehttp://arxiv.org/abs/2511.07434v1RL-Exec: Impact-Aware Reinforcement Learning for Opportunistic Optimal Liquidation, Outperforms TWAP and a Book-Liquidity VWAP on BTC-USD Replays2025-10-30T20:25:49ZWe study opportunistic optimal liquidation over fixed deadlines on BTC-USD limit-order books (LOB). We present RL-Exec, a PPO agent trained on historical replays augmented with endogenous transient impact (resilience), partial fills, maker/taker fees, and latency. The policy observes depth-20 LOB features plus microstructure indicators and acts under a sell-only inventory constraint to reach a residual target. Evaluation follows a strict time split (train: Jan-2020; test: Feb-2020) and a per-day protocol: for each test day we run ten independent start times and aggregate to a single daily score, avoiding pseudo-replication. We compare the agent to (i) TWAP and (ii) a VWAP-like baseline allocating using opposite-side order-book liquidity (top-20 levels), both executed on identical timestamps and costs. Statistical inference uses one-sided Wilcoxon signed-rank tests on daily RL-baseline differences with Benjamini-Hochberg FDR correction and bootstrap confidence intervals. On the Feb-2020 test set, RL-Exec significantly outperforms both baselines and the gap increases with the execution horizon (+2-3 bps at 30 min, +7-8 bps at 60 min, +23 bps at 120 min).
Code: github.com/Giafferri/RL-Exec2025-10-30T20:25:49Z8 pages main text, 3 appendix pages, 10 figuresEnzo DuflotStanislas Robineauhttp://arxiv.org/abs/2510.01956v2Rolling intrinsic for battery valuation in day-ahead and intraday markets2025-10-29T17:41:31ZBattery Energy Storage Systems (BESS) are a cornerstone of the energy transition, as their ability to shift electricity across time enables both grid stability and the integration of renewable generation. This paper investigates the profitability of different market bidding strategies for BESS in the Central European wholesale power market, focusing on the day-ahead auction and intraday trading at EPEX Spot. We employ the rolling intrinsic approach as a realistic trading strategy for continuous intraday markets, explicitly incorporating bid--ask spreads to account for liquidity constraints. Our analysis shows that multi-market bidding strategies consistently outperform single-market participation. Furthermore, we demonstrate that maximum cycle limits significantly affect profitability, indicating that more flexible strategies which relax daily cycling constraints while respecting annual limits can unlock additional value.2025-10-02T12:24:41ZDaniel OeltzTobias Pfingstenhttp://arxiv.org/abs/2510.24467v1The Omniscient, yet Lazy, Investor2025-10-28T14:35:14ZWe formalize the paradox of an omniscient yet lazy investor - a perfectly informed agent who trades infrequently due to execution or computational frictions. Starting from a deterministic geometric construction, we derive a closed-form expected profit function linking trading frequency, execution cost, and path roughness. We prove existence and uniqueness of the optimal trading frequency and show that this optimum can be interpreted through the fractal dimension of the price path. A stochastic extension under fractional Brownian motion provides analytical expressions for the optimal interval and comparative statics with respect to the Hurst exponent. Empirical illustrations on equity data confirm the theoretical scaling behavior.2025-10-28T14:35:14ZStanisław M. S. Halkiewiczhttp://arxiv.org/abs/2510.23150v2Revisiting the Structure of Trend Premia: When Diversification Hides Redundancy2025-10-28T07:11:26ZRecent work has emphasized the diversification benefits of combining trend signals across multiple horizons, with the medium-term window-typically six months to one year-long viewed as the "sweet spot" of trend-following. This paper revisits this conventional view by reallocating exposure dynamically across horizons using a Bayesian optimization framework designed to learn the optimal weights assigned to each trend horizon at the asset level. The common practice of equal weighting implicitly assumes that all assets benefit equally from all horizons; we show that this assumption is both theoretically and empirically suboptimal. We first optimize the horizon-level weights at the asset level to maximize the informativeness of trend signals before applying Bayesian graphical models-with sparsity and turnover control-to allocate dynamically across assets. The key finding is that the medium-term band contributes little incremental performance or diversification once short- and long-term components are included. Removing the 125-day layer improves Sharpe ratios and drawdown efficiency while maintaining benchmark correlation. We then rationalize this outcome through a minimum-variance formulation, showing that the medium-term horizon largely overlaps with its neighboring horizons. The resulting "barbell" structure-combining short- and long-term trends-captures most of the performance while reducing model complexity. This result challenges the common belief that more horizons always improve diversification and suggests that some forms of time-scale diversification may conceal unnecessary redundancy in trend premia.2025-10-27T09:26:24Z42 pages, 5 figuresAlban EtienneJean-Jacques OhanaEric BenhamouBéatrice GuezEthan SetroukThomas Jacquothttp://arxiv.org/abs/2510.23201v1Building Trust in Illiquid Markets: an AI-Powered Replication of Private Equity Funds2025-10-27T10:39:50ZIn response to growing demand for resilient and transparent financial instruments, we introduce a novel framework for replicating private equity (PE) performance using liquid, AI-enhanced strategies. Despite historically delivering robust returns, private equity's inherent illiquidity and lack of transparency raise significant concerns regarding investor trust and systemic stability, particularly in periods of heightened market volatility. Our method uses advanced graphical models to decode liquid PE proxies and incorporates asymmetric risk adjustments that emulate private equity's unique performance dynamics. The result is a liquid, scalable solution that aligns closely with traditional quarterly PE benchmarks like Cambridge Associates and Preqin. This approach enhances portfolio resilience and contributes to the ongoing discourse on safe asset innovation, supporting market stability and investor confidence.2025-10-27T10:39:50Z8 pages, presented at Global Finance ConferenceE. BenhamouJJ. OhanaB. GuezE. SetroukT. Jacquothttp://arxiv.org/abs/2510.23183v1PEARL: Private Equity Accessibility Reimagined with Liquidity2025-10-27T10:22:38ZIn this work, we introduce PEARL (Private Equity Accessibility Reimagined with Liquidity), an AI-powered framework designed to replicate and decode private equity funds using liquid, cost-effective assets. Relying on previous research methods such as Erik Stafford's single stock selection (Stafford) and Thomson Reuters - Refinitiv's sector approach (TR), our approach incorporates an additional asymmetry to capture the reduced volatility and better performance of private equity funds resulting from sale timing, leverage, and stock improvements through management changes. As a result, our model exhibits a strong correlation with well-established liquid benchmarks such as Stafford and TR, as well as listed private equity firms (Listed PE), while enhancing performance to better align with renowned quarterly private equity benchmarks like Cambridge Associates, Preqin, and Bloomberg Private Equity Fund indices. Empirical findings validate that our two-step approachdecoding liquid daily private equity proxies with a degree of negative return asymmetry outperforms the initial daily proxies and yields performance more consistent with quarterly private equity benchmarks.2025-10-27T10:22:38Z8 pages, 1 figure, presented at 8th private markets research conference (Lausanne)E. BenhamouJJ. OhanaB. GuezE. SetroukT. Jacquothttp://arxiv.org/abs/2510.22834v1Deviations from Tradition: Stylized Facts in the Era of DeFi2025-10-26T21:01:54ZDecentralized Exchanges (DEXs) are now a significant component of the financial world where billions of dollars are traded daily. Differently from traditional markets, which are typically based on Limit Order Books, DEXs typically work as Automated Market Makers, and, since the implementation of Uniswap v3, feature concentrated liquidity. By investigating the twenty-four most active pools in Uniswap v3 during 2023 and 2024, we empirically study how this structural change in the organization of the markets modifies the well-studied stylized facts of prices, liquidity, and order flow observed in traditional markets. We find a series of new statistical regularities in the distributions and cross-autocorrelation functions of these variables that we are able to associate either with the market structure (e.g., the execution of orders in blocks) or with the intense activity of Maximal Extractable Value searchers, such as Just-in-Time liquidity providers and sandwich attackers.2025-10-26T21:01:54ZDaniele Maria Di NosseFederico GattaFabrizio LilloSebastian Jaimungal