https://arxiv.org/api/9hSpxVE4xcI05CMpqh92LHedY8w 2026-03-28T14:43:22Z 2171 225 15 http://arxiv.org/abs/2507.14960v1 A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books 2025-07-20T13:42:36Z The detection of outliers within cryptocurrency limit order books (LOBs) is of paramount importance for comprehending market dynamics, particularly in highly volatile and nascent regulatory environments. This study conducts a comprehensive comparative analysis of robust statistical methods and advanced machine learning techniques for real-time anomaly identification in cryptocurrency LOBs. Within a unified testing environment, named AITA Order Book Signal (AITA-OBS), we evaluate the efficacy of thirteen diverse models to identify which approaches are most suitable for detecting potentially manipulative trading behaviours. An empirical evaluation, conducted via backtesting on a dataset of 26,204 records from a major exchange, demonstrates that the top-performing model, Empirical Covariance (EC), achieves a 6.70% gain, significantly outperforming a standard Buy-and-Hold benchmark. These findings underscore the effectiveness of outlier-driven strategies and provide insights into the trade-offs between model complexity, trade frequency, and performance. This study contributes to the growing corpus of research on cryptocurrency market microstructure by furnishing a rigorous benchmark of anomaly detection models and highlighting their potential for augmenting algorithmic trading and risk management. 2025-07-20T13:42:36Z Ivan Letteri http://arxiv.org/abs/2507.15876v1 Re-evaluating Short- and Long-Term Trend Factors in CTA Replication: A Bayesian Graphical Approach 2025-07-17T12:09:29Z Commodity Trading Advisors (CTAs) have historically relied on trend-following rules that operate on vastly different horizons from long-term breakouts that capture major directional moves to short-term momentum signals that thrive in fast-moving markets. Despite a large body of work on trend following, the relative merits and interactions of short-versus long-term trend systems remain controversial. This paper adds to the debate by (i) dynamically decomposing CTA returns into short-term trend, long-term trend and market beta factors using a Bayesian graphical model, and (ii) showing how the blend of horizons shapes the strategy's risk-adjusted performance. 2025-07-17T12:09:29Z 13 pages Eric Benhamou Jean-Jacques Ohana Alban Etienne Béatrice Guez Ethan Setrouk Thomas Jacquot http://arxiv.org/abs/2507.10701v1 Kernel Learning for Mean-Variance Trading Strategies 2025-07-14T18:17:50Z In this article, we develop a kernel-based framework for constructing dynamic, pathdependent trading strategies under a mean-variance optimisation criterion. Building on the theoretical results of (Muca Cirone and Salvi, 2025), we parameterise trading strategies as functions in a reproducing kernel Hilbert space (RKHS), enabling a flexible and non-Markovian approach to optimal portfolio problems. We compare this with the signature-based framework of (Futter, Horvath, Wiese, 2023) and demonstrate that both significantly outperform classical Markovian methods when the asset dynamics or predictive signals exhibit temporal dependencies for both synthetic and market-data examples. Using kernels in this context provides significant modelling flexibility, as the choice of feature embedding can range from randomised signatures to the final layers of neural network architectures. Crucially, our framework retains closed-form solutions and provides an alternative to gradient-based optimisation. 2025-07-14T18:17:50Z 49 pages Owen Futter Nicola Muca Cirone Blanka Horvath http://arxiv.org/abs/2507.10149v1 A Coincidence of Wants Mechanism for Swap Trade Execution in Decentralized Exchanges 2025-07-14T10:53:25Z We propose a mathematically rigorous framework for identifying and completing Coincidence of Wants (CoW) cycles in decentralized exchange (DEX) aggregators. Unlike existing auction based systems such as CoWSwap, our approach introduces an asset matrix formulation that not only verifies feasibility using oracle prices and formal conservation laws but also completes partial CoW cycles of swap orders that are discovered using graph traversal and are settled using imbalance correction. We define bridging orders and show that the resulting execution is slippage free and capital preserving for LPs. Applied to real world Arbitrum swap data, our algorithm demonstrates efficient discovery of CoW cycles and supports the insertion of synthetic orders for atomic cycle closure. This work can be thought of as the detailing of a potential delta-neutral strategy by liquidity providing market makers: a structured CoW cycle execution. 2025-07-14T10:53:25Z Abhimanyu Nag Madhur Prabhakar Tanuj Behl http://arxiv.org/abs/2409.02025v2 Logarithmic regret in the ergodic Avellaneda-Stoikov market making model 2025-07-14T07:04:00Z We analyse the regret arising from learning the price sensitivity parameter $κ$ of liquidity takers in the ergodic version of the Avellaneda-Stoikov market making model. We show that a learning algorithm based on a maximum-likelihood estimator for the parameter achieves the regret upper bound of order $\ln^2 T$ in expectation. To obtain the result we need two key ingredients. The first is the twice differentiability of the ergodic constant under the misspecified parameter in the Hamilton-Jacobi-Bellman (HJB) equation with respect to $κ$, which leads to a second--order performance gap. The second is the learning rate of the regularised maximum-likelihood estimator which is obtained from concentration inequalities for Bernoulli signals. Numerical experiments confirm the convergence and the robustness of the proposed algorithm. 2024-09-03T16:20:07Z Jialun Cao David Šiška Lukasz Szpruch Tanut Treetanthiploet http://arxiv.org/abs/2507.09739v1 Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500 2025-07-13T18:30:57Z This study integrates real-time sentiment analysis from financial news, GPT-2 and FinBERT, with technical indicators and time-series models like ARIMA and ETS to optimize S&P 500 trading strategies. By merging sentiment data with momentum and trend-based metrics, including a benchmark buy-and-hold and sentiment-based approach, is evaluated through assets values and returns. Results show that combining sentiment-driven insights with traditional models improves trading performance, offering a more dynamic approach to stock trading that adapts to market changes in volatile environments. 2025-07-13T18:30:57Z Haojie Liu Zihan Lin Randall R. Rojas http://arxiv.org/abs/2507.09734v1 Boltzmann Price: Toward Understanding the Fair Price in High-Frequency Markets 2025-07-13T18:17:48Z In this paper, we introduce a parametrized family of prices derived from the Maximum Entropy Principle. The price is obtained from the distribution that minimizes bias, given the bid and ask volume imbalance at the top of the order book. Under specific parameter choices, it closely approximates the mid-price or the weighted mid-price. Using probabilities of bid and ask states, we propose a model of price dynamics in which both drift and volatility are driven by volume imbalance. Compared to standard models like Bachelier or Geometric Brownian Motion with constant volatility, our model can generate higher kurtosis and heavy-tailed distributions. Additionally, the drift term naturally emerges as a consequence of the order book imbalance. We validate the model through simulation and demonstrate its fit to historical equity data. The model provides a theoretical framework, integrating price, volume imbalance, and spread. 2025-07-13T18:17:48Z Przemysław Rola http://arxiv.org/abs/2507.04481v1 Does Overnight News Explain Overnight Returns? 2025-07-06T17:37:50Z Over the past 30 years, nearly all the gains in the U.S. stock market have been earned overnight, while average intraday returns have been negative or flat. We find that a large part of this effect can be explained through features of intraday and overnight news. Our analysis uses a collection of 2.4 million news articles. We apply a novel technique for supervised topic analysis that selects news topics based on their ability to explain contemporaneous market returns. We find that time variation in the prevalence of news topics and differences in the responses to news topics both contribute to the difference in intraday and overnight returns. In out-of-sample tests, our approach forecasts which stocks will do particularly well overnight and particularly poorly intraday. Our approach also helps explain patterns of continuation and reversal in intraday and overnight returns. We contrast the effect of news with other mechanisms proposed in the literature to explain overnight returns. 2025-07-06T17:37:50Z Paul Glasserman Kriste Krstovski Paul Laliberte Harry Mamaysky http://arxiv.org/abs/2502.21206v3 Chronologically Consistent Large Language Models 2025-07-06T01:02:58Z Large language models are increasingly used in social sciences, but their training data can introduce lookahead bias and training leakage. A good chronologically consistent language model requires efficient use of training data to maintain accuracy despite time-restricted data. Here, we overcome this challenge by training a suite of chronologically consistent large language models, ChronoBERT and ChronoGPT, which incorporate only the text data that would have been available at each point in time. Despite this strict temporal constraint, our models achieve strong performance on natural language processing benchmarks, outperforming or matching widely used models (e.g., BERT), and remain competitive with larger open-weight models. Lookahead bias is model and application-specific because even if a chronologically consistent language model has poorer language comprehension, a regression or prediction model applied on top of the language model can compensate. In an asset pricing application predicting next-day stock returns from financial news, we find that ChronoBERT and ChronoGPT's real-time outputs achieve Sharpe ratios comparable to a much larger Llama model, indicating that lookahead bias is modest. Our results demonstrate a scalable, practical framework to mitigate training leakage, ensuring more credible backtests and predictions across finance and other social science domains. 2025-02-28T16:25:50Z Songrun He Linying Lv Asaf Manela Jimmy Wu http://arxiv.org/abs/2506.22142v1 Optimal Benchmark Design under Costly Manipulation 2025-06-27T11:39:18Z Price benchmarks are used to incorporate market price trends into contracts, but their use can create opportunities for manipulation by parties involved in the contract. This paper examines this issue using a realistic and tractable model inspired by smart contracts on blockchains like Ethereum. In our model, manipulation costs depend on two factors: the magnitude of adjustments to individual prices (variable costs) and the number of prices adjusted (fixed costs). We find that a weighted mean is the optimal benchmark when fixed costs are negligible, while the median is optimal when variable costs are negligible. In cases where both fixed and variable costs are significant, the optimal benchmark can be implemented as a trimmed mean, with the degree of trimming increasing as fixed costs become more important relative to variable costs. Furthermore, we show that the optimal weights for a mean-based benchmark are proportional to the marginal manipulation costs, whereas the median remains optimal without weighting, even when fixed costs differ across prices. 2025-06-27T11:39:18Z Ángel Hernando-Veciana http://arxiv.org/abs/2506.02869v2 Optimal Dynamic Fees in Automated Market Makers 2025-06-24T20:44:53Z Automated Market Makers (AMMs) are emerging as a popular decentralised trading platform. In this work, we determine the optimal dynamic fees in a constant function market maker. We find approximate closed-form solutions to the control problem and study the optimal fee structure. We find that there are two distinct fee regimes: one in which the AMM imposes higher fees to deter arbitrageurs, and another where fees are lowered to increase volatility and attract noise traders. Our results also show that dynamic fees that are linear in inventory and are sensitive to changes in the external price are a good approximation of the optimal fee structure and thus constitute suitable candidates when designing fees for AMMs. 2025-06-03T13:34:28Z 18 pages Leonardo Baggiani Martin Herdegen Leandro Sánchez-Betancourt http://arxiv.org/abs/2412.10823v2 FinGPT: Enhancing Sentiment-Based Stock Movement Prediction with Dissemination-Aware and Context-Enriched LLMs 2025-06-22T14:11:04Z Financial sentiment analysis is crucial for understanding the influence of news on stock prices. Recently, large language models (LLMs) have been widely adopted for this purpose due to their advanced text analysis capabilities. However, these models often only consider the news content itself, ignoring its dissemination, which hampers accurate prediction of short-term stock movements. Additionally, current methods often lack sufficient contextual data and explicit instructions in their prompts, limiting LLMs' ability to interpret news. In this paper, we propose a data-driven approach that enhances LLM-powered sentiment-based stock movement predictions by incorporating news dissemination breadth, contextual data, and explicit instructions. We cluster recent company-related news to assess its reach and influence, enriching prompts with more specific data and precise instructions. This data is used to construct an instruction tuning dataset to fine-tune an LLM for predicting short-term stock price movements. Our experimental results show that our approach improves prediction accuracy by 8\% compared to existing methods. 2024-12-14T13:04:42Z 1st Workshop on Preparing Good Data for Generative AI: Challenges and Approaches@ AAAI 2025, ai4finance.org Yixuan Liang Yuncong Liu Neng Wang Hongyang Yang Boyu Zhang Christina Dan Wang http://arxiv.org/abs/2307.03499v3 Decentralised Finance and Automated Market Making: Execution and Speculation 2025-06-17T21:18:33Z Automated market makers (AMMs) are a new prototype of decentralised exchanges which are revolutionising market interactions. The majority of AMMs are constant product markets (CPMs) where exchange rates are set by a trading function. This work studies optimal trading and statistical arbitrage in CPMs where balancing exchange rate risk and execution costs is key. Empirical evidence shows that execution costs are accurately estimated by the convexity of the trading function. These convexity costs are linear in the trade size and are nonlinear in the depth of liquidity and in the exchange rate. We develop models for when exchange rates form in a competing centralised exchange, in a CPM, or in both venues. Finally, we derive computationally efficient strategies that account for stochastic convexity costs and we showcase their out-of-sample performance. 2023-07-07T10:25:59Z Forthcoming in Journal of Economic Dynamics and Control Álvaro Cartea Fayçal Drissi Marcello Monga http://arxiv.org/abs/2411.13993v2 Market Making without Regret 2025-06-17T07:56:29Z We consider a sequential decision-making setting where, at every round $t$, a market maker posts a bid price $B_t$ and an ask price $A_t$ to an incoming trader (the taker) with a private valuation for one unit of some asset. If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs. If a trade happens at round $t$, then letting $M_t$ be the market price (observed only at the end of round $t$), the maker's utility is $M_t - B_t$ if the maker bought the asset, and $A_t - M_t$ if they sold it. We characterize the maker's regret with respect to the best fixed choice of bid and ask pairs under a variety of assumptions (adversarial, i.i.d., and their variants) on the sequence of market prices and valuations. Our upper bound analysis unveils an intriguing connection relating market making to first-price auctions and dynamic pricing. Our main technical contribution is a lower bound for the i.i.d. case with Lipschitz distributions and independence between prices and valuations. The difficulty in the analysis stems from the unique structure of the reward and feedback functions, allowing an algorithm to acquire information by graduating the "cost of exploration" in an arbitrary way. 2024-11-21T10:13:55Z Nicolò Cesa-Bianchi Tommaso Cesari Roberto Colomboni Luigi Foscari Vinayak Pathak http://arxiv.org/abs/2502.09172v2 LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data 2025-06-16T14:02:28Z While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes. 2025-02-13T10:56:58Z Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025 Peer Nagy Sascha Frey Kang Li Bidipta Sarkar Svitlana Vyetrenko Stefan Zohren Ani Calinescu Jakob Foerster