https://arxiv.org/api/Te5n0HZJ/8s1NuM0zumPOVkAIzM2026-06-13T13:34:20Z22593015http://arxiv.org/abs/2605.15746v5The Privacy Subsidy: Kyle's $λ$ under Noise-Perturbed Order-Flow Observation2026-05-31T05:48:24ZPrivacy-preserving cryptocurrency exchanges alter what the pricing mechanism observes about order flow. We derive the unique linear Kyle equilibrium when a committed Bayesian market maker observes order flow perturbed by independent Gaussian privacy noise. The price-impact coefficient and informed-trader strategy rescale by reciprocal factors of the privacy parameter (one down, one up), so their product is invariant. A welfare decomposition then identifies a closed-form per-period transfer from the protocol's LP pool to traders -- the "privacy subsidy", the break-even fee any privacy-aggregated exchange must charge. The result is the single-period closed-form privacy-noise analog of Loss-Versus-Rebalancing (Milionis et al. 2022). The primary application is shielded AMMs with explicit additive-noise injection (e.g., differential privacy); related designs (batched swaps, sealed-bid auctions, oracle-pegged crossings) require separate frameworks that we leave to future work.2026-05-15T08:56:16Zv5: Framing reconciliation (the privacy subsidy is the generic, competition-robust cost of pricing on a signal coarser than the settled flow); reference corrections; manuscript body refactored. 17 pages, 1 figureYuki Nakamurahttp://arxiv.org/abs/2605.30643v1Quality-Adjusted Hit-Ratio Targeting in Corporate Bond Market Making2026-05-28T22:56:12ZHit ratio is a common service metric for electronic corporate bond market making, but raw hit-ratio targets can be economically misleading when client flow has heterogeneous adverse-selection content. This paper extends a stochastic-control framework for OTC bond RFQ market making with hit-ratio constraints by replacing raw hit ratio with a residual-quality-adjusted hit ratio. The key modelling distinction is that adverse post-trade markouts are first decomposed into observable credit factors, carry/rolldown, issuer-relative-value effects, index or ETF demand effects, and residual adverse selection. Only the residual component is treated as client-flow toxicity. The resulting control problem remains tractable: after dualizing the quality-hit-ratio penalty, the HJB retains separable Hamiltonians, and the dual variable is the solution of an exact one-dimensional nonlinear fixed point for each targeted tier. Under a quadratic value-function approximation, optimal quotes decompose into a riskless spread, inventory skew, credit-alpha skew, residual-toxicity charge, and quality-hit-ratio subsidy. Synthetic multi-bond simulations with nonlinear dual solves illustrate that raw hit-ratio targeting can subsidize residual-toxic flow, while residual-quality targeting reallocates service toward low-residual-toxicity flow and improves the attained service/economics frontier. A final reduced-form extension studies inventory-recycling value through risk-aware style-aligned client-flow warehousing. Sweep or portfolio-trade opportunities fill randomly, and participation is sized using the same quadratic value approximation as the RFQ quoting problem. A passive/index-demand experiment is reported in the appendix as a special case of forecastable client flow. The numerical evidence is synthetic and mechanism-oriented; no proprietary RFQ data are used.2026-05-28T22:56:12ZBouna Nianghttp://arxiv.org/abs/2605.30442v1When market boundaries weaken: Network reconfiguration and regime-dependent cross-asset spillovers2026-05-28T18:13:32ZCryptocurrencies are increasingly adopted as investment assets, making their interactions with traditional financial markets central to cross-asset diversification and systemic risk. This paper studies the integration of cryptocurrencies, fiat currencies, and S&P500 equities using a balanced panel of 381 assets from October 2017 to February 2024. We combine rolling correlation networks, consensus-based community detection, market-specific and system-wide Turbulence Indices, and VAR-based connectedness analysis to examine how market stress, network topology, and shock transmission co-evolve across regimes. The results show that cross-asset integration is episodic. In normal periods, the three asset classes remain relatively segmented, whereas under stress, local clustering increases, modular separation weakens, and communities become more compositionally mixed across asset classes. Connectedness analysis further shows that regime shifts alter the structure of transmission rather than simply increasing spillover magnitudes. In high-turbulence states, fiat-market turbulence becomes the main propagation channel, while network clustering and modularity become more involved in forecast-uncertainty transmission. These findings support the interpretation of network topology as an emergent, state-dependent amplification channel rather than a persistent exogenous driver of turbulence. The results highlight the need for regime-aware risk monitoring, since full-sample connectedness estimates can understate the coupling that arises when diversification benefits are most vulnerable.2026-05-28T18:13:32ZRuixue JingLuis Enrique Correa Rochahttp://arxiv.org/abs/2408.02634v3CLVR Ordering of Transactions on AMMs2026-05-28T06:30:36ZThis paper introduces a trade ordering rule that aims to reduce intra-block price volatility in Automated Market Maker (AMM) powered decentralized exchanges. The ordering rule introduced here, Clever Look-ahead Volatility Reduction (CLVR), operates under the (common) framework in decentralized finance that allows some entities to observe trade requests before they are settled, assemble them into "blocks", and order them as they like. On AMM exchanges, asset prices are continuously and transparently updated as a result of each trade and therefore, transaction order has high financial value. CLVR aims to order transactions for traders' benefit. Our primary focus is intra-block price stability (minimizing volatility), which has two main benefits for traders: it reduces transaction failure rate and allows traders to receive closer prices to the reference price at which they submit their transactions accordingly. We show that CLVR constructs an ordering which approximately minimizes price volatility with a small computation cost and can be trivially verified externally.2024-08-05T16:58:48ZRobert McLaughlinNir ChemayaDingyue LiuDahlia Malkhihttp://arxiv.org/abs/2602.18481v2AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models2026-05-27T14:41:33ZThe rapid advancement of Large Language Models (LLMs) has led to a surge of financial benchmarks, evolving from static knowledge evaluation toward interactive trading simulations. However, existing frameworks for evaluating real-time trading largely overlook a critical failure mode: the severe behavioral instability of LLMs in sequential decision-making under financial uncertainty. Through extensive experiments, we show that when deployed as trading agents, LLMs exhibit extreme run-to-run variance, generate inconsistent action sequences even under deterministic decoding, and frequently produce irrational action flipping across adjacent time steps. We attribute these behaviors to the stateless autoregressive nature of LLMs, which lack persistent memory of prior actions, together with their sensitivity to continuous-to-discrete action mappings in portfolio allocation tasks. These deficiencies fundamentally undermine the reliability and reproducibility of many existing online and offline trading benchmarks. To address these limitations, we propose AlphaForgeBench, a principled evaluation framework that redefines LLMs as quantitative researchers rather than stochastic trading agents. Instead of producing discrete trading actions, AlphaForgeBench requires models to generate executable alpha factors and compose factor-based trading strategies grounded in financial knowledge. This paradigm decouples reasoning from execution mechanics, enabling deterministic and reproducible evaluation while remaining aligned with real-world quantitative research workflows. Extensive experiments across multiple state-of-the-art LLMs demonstrate that AlphaForgeBench eliminates execution-induced instability and provides a rigorous benchmark for evaluating financial reasoning, strategy formulation, and alpha discovery. Webpage at https://finbrain-lab-hkustgz.github.io/AlphaForgeBench2026-02-10T14:29:33ZWentao ZhangMingxuan ZhaoJincheng GaoJieshun YouHuaiyu JiaYilei ZhaoBo AnShuo Sunhttp://arxiv.org/abs/2605.28359v1From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets2026-05-27T11:57:10ZEvaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an agent in a historical market, let it trade, and measure portfolio returns. This setup is vulnerable to two evaluation failures. First, long backtests often overlap with the knowledge cutoffs of frontier LLMs, allowing memorized tickers, dates, prices, and market narratives to substitute for investment reasoning. Second, raw returns are a noisy proxy for stock-selection ability, since positive performance may come from market beta, style exposure, or favorable regimes rather than genuine alpha.
We introduce KTD-Fin (Knowing-To-Doing Financial Benchmark), an end-to-end stock-market trading benchmark that addresses both issues. KTD-Fin uses a data-side masking protocol to anonymize key identifiers and calendar information consistently across prompts and tools, separating historical market memory from investment decision-making. It also incorporates a Barra-style performance attribution framework that decomposes portfolio returns into market, style, and stock-selection alpha components.
Across ten frontier LLM agents evaluated on the Chinese CSI300 over a 2024--2026 window, masking substantially changes agent rationales, pushing them towards anonymized factor-based reasoning. Attribution analysis further shows that LLM agents' cumulative returns under leakage-controlled evaluation are largely explained by passive market and style exposure, with limited evidence of persistent stock-selection alpha. These findings suggest that financial LLM benchmarks should evaluate not only whether an agent makes money, but also whether the source of returns reflects transferable investment skill. We release KTD-Fin as a reproducible template for leakage-controlled and attribution-aware evaluation of LLM trading agents.2026-05-27T11:57:10ZTaojie ZhuWentao ZhaoRui SunBeidi LuanJiacheng LuSinuo WangJing LiDaxin JiangYonghong HeZuo Baihttp://arxiv.org/abs/2406.05854v2Can market volumes reveal traders' rationality and a new risk premium?2026-05-26T17:40:37ZAn empirical analysis, suggested by optimal Merton dynamics, reveals some unexpected features of asset volumes. These features are connected to traders' belief and risk aversion. This paper proposes a trading strategy model in the optimal Merton framework that is representative of the collective behavior of heterogeneous rational traders. This model allows for the estimation of the average risk aversion of traders acting on a specific risky asset, while revealing the existence of a price of risk closely related to market price of risk and volume rate. The empirical analysis, conducted on real data, confirms the validity of the proposed model.2024-06-09T16:56:01ZFrancesca MarianiMaria Cristina RecchioniTai-Ho WangRoberto Giacalonehttp://arxiv.org/abs/2605.24878v1Entropy-Regularized Certainty-Equivalent Bellman Policies for Risk-Sensitive Market Making2026-05-24T05:43:16ZWe study a finite-inventory risk-sensitive market making problem in which a dealer controls bid and ask quotes, faces Brownian midprice risk, and receives liquidity-taking orders through point processes with quote-dependent intensities. The objective is the certainty equivalent induced by exponential utility with terminal and running inventory penalties. We introduce an exact discrete entropy-regularized Bellman operator that applies log-sum-exp regularization to deterministic-action certainty-equivalent scores, rather than to a risk-neutral one-step reward. This distinction is essential because the exponential certainty equivalent does not commute with quote randomization.
For time step \(h\) and entropy parameter \(λ\), we prove uniform convergence to the unregularized continuous-time risk-sensitive value at rate \[
O\bigl(h+λ(1+|\logλ|)\bigr). \] We also prove certainty-equivalent performance bounds for the induced Gibbs policies under a fresh-sampling relaxed implementation, in which quote marks are sampled at potential fill events rather than frozen over a time step. Under a quadratic growth condition on the Hamiltonian in the relevant quote coordinates, these policies concentrate around the unregularized optimal quote set. Finally, we show that a lower-cost Hamiltonian-Gibbs proxy satisfies a certainty-equivalent performance bound of the same order as the exact Bellman Gibbs policy. Numerical experiments in an Avellaneda--Stoikov specification support the predicted scaling for discretization error, entropy bias, policy gap, quote concentration, and exact-versus-proxy consistency.2026-05-24T05:43:16ZTenghan Zhonghttp://arxiv.org/abs/2605.22667v2Imperfect Commitment in Maximal Extractable Value Auctions2026-05-22T20:05:31ZEthereum block builders run sealed auctions among searchers, but nothing in the protocol forces a builder to honor the auction outcome after observing submitted bundles. This paper studies the commitment problem. We model a builder who defects with probability $\varepsilon$ and, upon defection, replicates a type-specific fraction $γ(τ)$ of the winning MEV opportunity. Searchers anticipate this behavior and choose between a risky first-price bid and a safe deterrence bid that makes frontrunning unprofitable. The resulting equilibrium is piecewise, with the cost of imperfect commitment depending jointly on replicability and competition. Using the libMEV dataset, we estimate $γ(τ)$ from right-tail bribe plateaus and decompose observed auction revenue against the surplus a defecting builder could capture. The results show sharp heterogeneity across MEV types: sandwich opportunities are already highly competitive, while naked arbitrage and liquidations leave substantially more surplus exposed to builder defection. Credible MEV auctions, therefore, require not only an auction format, but also constraints on the builder's ability to use observed bid and payload information ex post.2026-05-21T16:08:56ZAleksei AdadurovSergey BarseghyanAnton ChtepineAntero ElorantaAndrei SebyakinArsenii Valitovhttp://arxiv.org/abs/2605.23007v1MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models2026-05-21T20:28:57ZWe explore the application of LLM-driven algorithm optimization to several common tasks in quantitative finance. MadEvolve, a general-purpose algorithm optimization framework inspired by DeepMind's Alpha-Evolve, was recently developed to optimize algorithms in computational cosmology. Here we demonstrate the utility of MadEvolve to optimize algorithmic trading strategies and alpha generation at the example of Bitcoin trading. On our simulation and backtesting setup, we achieve significant improvements on all tasks we considered, such as evolving feature sets for signal generation, optimizing separate components of the trading strategy, and jointly evolving the feature pipeline together with the execution strategy. Additionally, we compare our method to other agentic search approaches, specifically Claude Code, and carefully evaluate p-hacking probabilities on our simulation setup. Our findings strongly support the utility of AI-driven agentic and evolutionary algorithms for algorithmic trading and quantitative finance.2026-05-21T20:28:57ZYurii KvasiukTianyi LiOwen ColegroveMoritz Münchmeyerhttp://arxiv.org/abs/2510.15949v5ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination2026-05-20T15:24:21ZLarge language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.2025-10-10T13:01:51ZCharidimos PapadakisAngeliki DimitriouGiorgos FilandrianosMaria LymperaiouKonstantinos ThomasGiorgos Stamouhttp://arxiv.org/abs/2606.00061v1Reflexivity as Prompt: Does Awareness of Self-Reinforcing Market Dynamics Improve LLMs as Financial Market Forecasters?2026-05-19T18:10:07ZWe study how frontier large language models (LLMs) behave as financial forecasters during boom-bust market cycles when made progressively aware of Soros's theory of reflexivity. Standard AI-assisted forecasting treats the market as an exogenous system. Reflexivity theory holds otherwise: prices shape fundamentals, and every forecaster is a participative agent in the loop it analyzes. We evaluate three frontier models - GPT5, Claude Sonnet 4.6, and Gemini 3 Pro - under four accumulating zero-shot conditions across two historically distinct episodes: the dot-com bubble (1996-2001) and the global financial crisis (2004-2009). The primary metric is directional forecasting accuracy; we also report the Sharpe ratio of an implied long/cash strategy to capture the risk-adjusted economic value of the forecasts. All inputs are anonymized and normalized to guard against memorization. We find that conditions incorporating reflexivity awareness improve forecasting accuracy differently across models and context windows, revealing that the same theoretical awareness can produce qualitatively different forecasting behavior across frontier LLMs.2026-05-19T18:10:07ZEugene Parkhttp://arxiv.org/abs/2508.04003v2The Marginal Effects of Ethereum Network MEV Transaction Re-Ordering2026-05-19T16:09:02ZTwo MEV builders now produce nearly 80\% of Ethereum blocks. Block builders have the ability to reorder transactions on the blockchain in a way that can be harmful to participants. We estimate they would pay in the aggregate nearly \$14 million per month to ensure that they remained in the first quartile of the block. Sandwich attacks, in which a transaction is front-run, are frequent, averaging more than one per block. Gas fees on these transactions pay for nearly 15\% of the MEV payments to the validator. These attacks have especially large marginal effects and skew the distribution. Reforms such as gas fee priority or private transaction pools might be helpful.2025-08-06T01:32:02ZBruce MizrachNathaniel Yoshidahttp://arxiv.org/abs/2606.00060v1Machine Learning-Based Bitcoin Trading Under Transaction Costs: Evidence From Walk-Forward Forecasting2026-05-19T14:30:49ZThis paper investigates whether machine learning forecasts of hourly BTC-USDT returns can be converted into economically meaningful trading performance after transaction costs. Using approximately 70,000 hourly observations from 2018-2026, XGBoost, LSTM, and iTransformer are evaluated in a 27-fold walk-forward protocol. All three models produce positive gross trading performance in selected configurations, but naive sign-based strategies fail once transaction costs of ten basis points are imposed. A cost-aware execution filter, which prevents trades only when the forecast magnitude exceeds a transaction-cost-based threshold, sharply reduces turnover and restores profitability in selected configurations. The strongest long-only XGBoost strategy produces annualised returns above 65% with a Sharpe ratio above one. Additional tests show that technical indicators improve performance in selected cases, EGARCH-derived features do not provide uniformly robust gains, and XGBoost is descriptively stronger than the neural alternatives, although bootstrap evidence does not support formal statistical dominance. Loss-function and model-selection effects are secondary and statistically fragile. The results show that the main obstacle in hourly cryptocurrency trading is not only weak predictability, but also the way forecasts are converted into trades.2026-05-19T14:30:49Z42 pages,Andrei BysikRobert Ślepaczukhttp://arxiv.org/abs/2601.05085v3Trading Electrons: Predicting DART Spread Spikes in ISO Electricity Markets2026-05-19T14:00:57ZWe study the problem of forecasting and optimally trading day-ahead versus real-time (DART) price spreads in U.S. wholesale electricity markets. Building on the framework of Galarneau-Vincent et al., we extend spike prediction from a single zone to a multi-zone setting and treat both positive and negative DART spikes within a unified statistical model. To translate directional signals into economically meaningful positions, we develop a structural and market-consistent price impact model based on day-ahead bid stacks. This yields closed-form expressions for the optimal vector of zonal INC/DEC quantities, capturing asymmetric buy/sell impacts and cross-zone congestion effects. When applied to NYISO, the resulting impact-aware strategy significantly improves the risk-return profile relative to unit-size trading and highlights substantial heterogeneity across markets and seasons.2026-01-08T16:31:39Z40 pagesEmma HubertDimitrios LolasRonnie Sircar