https://arxiv.org/api/Wh/ei9x+unrmV/HW8jstD/fBx/g 2026-06-14T11:19:33Z 2259 330 15 http://arxiv.org/abs/2507.09739v1 Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500 2025-07-13T18:30:57Z

This study integrates real-time sentiment analysis from financial news, GPT-2 and FinBERT, with technical indicators and time-series models like ARIMA and ETS to optimize S&P 500 trading strategies. By merging sentiment data with momentum and trend-based metrics, including a benchmark buy-and-hold and sentiment-based approach, is evaluated through assets values and returns. Results show that combining sentiment-driven insights with traditional models improves trading performance, offering a more dynamic approach to stock trading that adapts to market changes in volatile environments.

2025-07-13T18:30:57Z Haojie Liu Zihan Lin Randall R. Rojas http://arxiv.org/abs/2507.09734v1 Boltzmann Price: Toward Understanding the Fair Price in High-Frequency Markets 2025-07-13T18:17:48Z

In this paper, we introduce a parametrized family of prices derived from the Maximum Entropy Principle. The price is obtained from the distribution that minimizes bias, given the bid and ask volume imbalance at the top of the order book. Under specific parameter choices, it closely approximates the mid-price or the weighted mid-price. Using probabilities of bid and ask states, we propose a model of price dynamics in which both drift and volatility are driven by volume imbalance. Compared to standard models like Bachelier or Geometric Brownian Motion with constant volatility, our model can generate higher kurtosis and heavy-tailed distributions. Additionally, the drift term naturally emerges as a consequence of the order book imbalance. We validate the model through simulation and demonstrate its fit to historical equity data. The model provides a theoretical framework, integrating price, volume imbalance, and spread.

2025-07-13T18:17:48Z Przemysław Rola http://arxiv.org/abs/2507.04481v1 Does Overnight News Explain Overnight Returns? 2025-07-06T17:37:50Z

Over the past 30 years, nearly all the gains in the U.S. stock market have been earned overnight, while average intraday returns have been negative or flat. We find that a large part of this effect can be explained through features of intraday and overnight news. Our analysis uses a collection of 2.4 million news articles. We apply a novel technique for supervised topic analysis that selects news topics based on their ability to explain contemporaneous market returns. We find that time variation in the prevalence of news topics and differences in the responses to news topics both contribute to the difference in intraday and overnight returns. In out-of-sample tests, our approach forecasts which stocks will do particularly well overnight and particularly poorly intraday. Our approach also helps explain patterns of continuation and reversal in intraday and overnight returns. We contrast the effect of news with other mechanisms proposed in the literature to explain overnight returns.

2025-07-06T17:37:50Z Paul Glasserman Kriste Krstovski Paul Laliberte Harry Mamaysky http://arxiv.org/abs/2502.21206v3 Chronologically Consistent Large Language Models 2025-07-06T01:02:58Z

Large language models are increasingly used in social sciences, but their training data can introduce lookahead bias and training leakage. A good chronologically consistent language model requires efficient use of training data to maintain accuracy despite time-restricted data. Here, we overcome this challenge by training a suite of chronologically consistent large language models, ChronoBERT and ChronoGPT, which incorporate only the text data that would have been available at each point in time. Despite this strict temporal constraint, our models achieve strong performance on natural language processing benchmarks, outperforming or matching widely used models (e.g., BERT), and remain competitive with larger open-weight models. Lookahead bias is model and application-specific because even if a chronologically consistent language model has poorer language comprehension, a regression or prediction model applied on top of the language model can compensate. In an asset pricing application predicting next-day stock returns from financial news, we find that ChronoBERT and ChronoGPT's real-time outputs achieve Sharpe ratios comparable to a much larger Llama model, indicating that lookahead bias is modest. Our results demonstrate a scalable, practical framework to mitigate training leakage, ensuring more credible backtests and predictions across finance and other social science domains.

2025-02-28T16:25:50Z Songrun He Linying Lv Asaf Manela Jimmy Wu http://arxiv.org/abs/2506.02869v2 Optimal Dynamic Fees in Automated Market Makers 2025-06-24T20:44:53Z

Automated Market Makers (AMMs) are emerging as a popular decentralised trading platform. In this work, we determine the optimal dynamic fees in a constant function market maker. We find approximate closed-form solutions to the control problem and study the optimal fee structure. We find that there are two distinct fee regimes: one in which the AMM imposes higher fees to deter arbitrageurs, and another where fees are lowered to increase volatility and attract noise traders. Our results also show that dynamic fees that are linear in inventory and are sensitive to changes in the external price are a good approximation of the optimal fee structure and thus constitute suitable candidates when designing fees for AMMs.

2025-06-03T13:34:28Z 18 pages Leonardo Baggiani Martin Herdegen Leandro Sánchez-Betancourt http://arxiv.org/abs/2412.10823v2 FinGPT: Enhancing Sentiment-Based Stock Movement Prediction with Dissemination-Aware and Context-Enriched LLMs 2025-06-22T14:11:04Z

Financial sentiment analysis is crucial for understanding the influence of news on stock prices. Recently, large language models (LLMs) have been widely adopted for this purpose due to their advanced text analysis capabilities. However, these models often only consider the news content itself, ignoring its dissemination, which hampers accurate prediction of short-term stock movements. Additionally, current methods often lack sufficient contextual data and explicit instructions in their prompts, limiting LLMs' ability to interpret news. In this paper, we propose a data-driven approach that enhances LLM-powered sentiment-based stock movement predictions by incorporating news dissemination breadth, contextual data, and explicit instructions. We cluster recent company-related news to assess its reach and influence, enriching prompts with more specific data and precise instructions. This data is used to construct an instruction tuning dataset to fine-tune an LLM for predicting short-term stock price movements. Our experimental results show that our approach improves prediction accuracy by 8\% compared to existing methods.

2024-12-14T13:04:42Z 1st Workshop on Preparing Good Data for Generative AI: Challenges and Approaches@ AAAI 2025, ai4finance.org Yixuan Liang Yuncong Liu Neng Wang Hongyang Yang Boyu Zhang Christina Dan Wang http://arxiv.org/abs/2307.03499v3 Decentralised Finance and Automated Market Making: Execution and Speculation 2025-06-17T21:18:33Z

Automated market makers (AMMs) are a new prototype of decentralised exchanges which are revolutionising market interactions. The majority of AMMs are constant product markets (CPMs) where exchange rates are set by a trading function. This work studies optimal trading and statistical arbitrage in CPMs where balancing exchange rate risk and execution costs is key. Empirical evidence shows that execution costs are accurately estimated by the convexity of the trading function. These convexity costs are linear in the trade size and are nonlinear in the depth of liquidity and in the exchange rate. We develop models for when exchange rates form in a competing centralised exchange, in a CPM, or in both venues. Finally, we derive computationally efficient strategies that account for stochastic convexity costs and we showcase their out-of-sample performance.

2023-07-07T10:25:59Z Forthcoming in Journal of Economic Dynamics and Control Álvaro Cartea Fayçal Drissi Marcello Monga http://arxiv.org/abs/2411.13993v2 Market Making without Regret 2025-06-17T07:56:29Z

We consider a sequential decision-making setting where, at every round $t$, a market maker posts a bid price $B_t$ and an ask price $A_t$ to an incoming trader (the taker) with a private valuation for one unit of some asset. If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs. If a trade happens at round $t$, then letting $M_t$ be the market price (observed only at the end of round $t$), the maker's utility is $M_t - B_t$ if the maker bought the asset, and $A_t - M_t$ if they sold it. We characterize the maker's regret with respect to the best fixed choice of bid and ask pairs under a variety of assumptions (adversarial, i.i.d., and their variants) on the sequence of market prices and valuations. Our upper bound analysis unveils an intriguing connection relating market making to first-price auctions and dynamic pricing. Our main technical contribution is a lower bound for the i.i.d. case with Lipschitz distributions and independence between prices and valuations. The difficulty in the analysis stems from the unique structure of the reward and feedback functions, allowing an algorithm to acquire information by graduating the "cost of exploration" in an arbitrary way.

2024-11-21T10:13:55Z Nicolò Cesa-Bianchi Tommaso Cesari Roberto Colomboni Luigi Foscari Vinayak Pathak http://arxiv.org/abs/2502.09172v2 LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data 2025-06-16T14:02:28Z

While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.

2025-02-13T10:56:58Z Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025 Peer Nagy Sascha Frey Kang Li Bidipta Sarkar Svitlana Vyetrenko Stefan Zohren Ani Calinescu Jakob Foerster http://arxiv.org/abs/2506.12281v1 A New Approach for the Continuous Time Kyle-Back Strategic Insider Equilibrium Problem 2025-06-14T00:06:16Z

This paper considers a continuous time Kyle-Back model which is a game problem between an insider and a market marker. The existing literature typically focuses on the existence of equilibrium by using the PDE approach, which requires certain Markovian structure and the equilibrium is in the bridge form. We shall provide a new approach which is used widely for stochastic controls and stochastic differential games. We characterize all equilibria through a coupled system of forward backward SDEs, where the forward one is the conditional law of the inside information and the backward one is the insider's optimal value. In particular, when the time duration is small, we show that the FBSDE is wellposed and thus the game has a unique equilibrium. This is the first uniqueness result in the literature, without restricting the equilibria to certain special structure. Moreover, this unique equilibrium may not be Markovian, indicating that the PDE approach cannot work in this case. We next study the set value of the game, which roughly speaking is the set of insider's values over all equilibria and thus is by nature unique. We show that, although the bridge type of equilibria in the literature does not satisfy the required integrability for our equilibria, its truncation serves as a desired approximate equilibrium and its value belongs to our set value. Finally, we characterize our set value through a level set of certain standard HJB equation.

2025-06-14T00:06:16Z Bixing Qiao Jianfeng Zhang http://arxiv.org/abs/2506.11921v1 Dynamic Grid Trading Strategy: From Zero Expectation to Market Outperformance 2025-06-13T16:11:44Z

We propose a profitable trading strategy for the cryptocurrency market based on grid trading. Starting with an analysis of the expected value of the traditional grid strategy, we show that under simple assumptions, its expected return is essentially zero. We then introduce a novel Dynamic Grid-based Trading (DGT) strategy that adapts to market conditions by dynamically resetting grid positions. Our backtesting results using minute-level data from Bitcoin and Ethereum between January 2021 and July 2024 demonstrate that the DGT strategy significantly outperforms both the traditional grid and buy-and-hold strategies in terms of internal rate of return and risk control.

2025-06-13T16:11:44Z 7 pages, 8 figures. Code available at https://github.com/colachenkc/Dynamic-Grid-Trading Kai-Yuan Chen Kai-Hsin Chen Jyh-Shing Roger Jang http://arxiv.org/abs/2506.11843v1 Multi-dimensional queue-reactive model and signal-driven models: a unified framework 2025-06-13T14:45:51Z

We present a Markovian market model driven by a hidden Brownian efficient price. In particular, we extend the queue-reactive model, making its dynamics dependent on the efficient price. Our study focuses on two sub-models: a signal-driven price model where the mid-price jump rates depend on the efficient price and an observable signal, and the usual queue-reactive model dependent on the efficient price via the intensities of the order arrivals. This way, we are able to correlate the evolution of limit order books of different stocks. We prove the stability of the observed mid-price around the efficient price under natural assumptions. Precisely, we show that at the macroscopic scale, prices behave as diffusions. We also develop a maximum likelihood estimation procedure for the model, and test it numerically. Our model is them used to backest trading strategies in a liquidation context.

2025-06-13T14:45:51Z 33 pages Emmanouil Sfendourakis http://arxiv.org/abs/2503.06928v2 FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models 2025-06-11T15:26:46Z

Despite the growing attention to time series forecasting in recent years, many studies have proposed various solutions to address the challenges encountered in time series prediction, aiming to improve forecasting performance. However, effectively applying these time series forecasting models to the field of financial asset pricing remains a challenging issue. There is still a need for a bridge to connect cutting-edge time series forecasting models with financial asset pricing. To bridge this gap, we have undertaken the following efforts: 1) We constructed three datasets from the financial domain; 2) We selected over ten time series forecasting models from recent studies and validated their performance in financial time series; 3) We developed new metrics, msIC and msIR, in addition to MSE and MAE, to showcase the time series correlation captured by the models; 4) We designed financial-specific tasks for these three datasets and assessed the practical performance and application potential of these forecasting models in important financial problems. We hope the developed new evaluation suite, FinTSBridge, can provide valuable insights into the effectiveness and robustness of advanced forecasting models in finanical domains.

2025-03-10T05:19:13Z ICLR 2025 Workshop Advances in Financial AI Yanlong Wang Jian Xu Tiantian Gao Hongkang Zhang Shao-Lun Huang Danny Dongning Sun Xiao-Ping Zhang http://arxiv.org/abs/2506.08992v1 Optimal hedging of an informed broker facing many traders 2025-06-10T17:04:39Z

This paper investigates the optimal hedging strategies of an informed broker interacting with multiple traders in a financial market. We develop a theoretical framework in which the broker, possessing exclusive information about the drift of the asset's price, engages with traders whose trading activities impact the market price. Using a mean-field game approach, we derive the equilibrium strategies for both the broker and the traders, illustrating the intricate dynamics of their interactions. The broker's optimal strategy involves a Stackelberg equilibrium, where the broker leads and the traders follow. Our analysis also addresses the mean field limit of finite-player models and shows the convergence to the mean-field solution as the number of traders becomes large.

2025-06-10T17:04:39Z Philippe Bergault Pierre Cardaliaguet Wenbin Yan http://arxiv.org/abs/2506.08718v1 Price Discovery in Cryptocurrency Markets 2025-06-10T12:07:11Z

This document analyzes price discovery in cryptocurrency markets by comparing centralized and decentralized exchanges, as well as spot and futures markets. The study focuses first on Ethereum (ETH) and then applies a similar approach to Bitcoin (BTC). Chapter 1 outlines the theoretical framework, emphasizing the structural differences between centralized exchanges and decentralized finance mechanisms, especially Automated Market Makers (AMMs). It also explains how to construct an order book from a liquidity pool in a decentralized setting for comparison with centralized exchanges. Chapter 2 describes the methodological tools used: Hasbrouck's Information Share, Gonzalo and Granger's Permanent-Transitory decomposition, and the Hayashi-Yoshida estimator. These are applied to explore lead-lag dynamics, cointegration, and price discovery across market types. Chapter 3 presents the empirical analysis. For ETH, it compares price dynamics on Binance and Uniswap v2 over a one-year period, focusing on five key events in 2024. For BTC, it analyzes the relationship between spot and futures prices on the CME. The study estimates lead-lag effects and cointegration in both cases. Results show that centralized markets typically lead in ETH price discovery. In futures markets, while they tend to lead overall, high-volatility periods produce mixed outcomes. The findings have key implications for traders and institutions regarding liquidity, arbitrage, and market efficiency. Various metrics are used to benchmark the performance of modified AMMs and to understand the interaction between decentralized and centralized structures.

2025-06-10T12:07:11Z Juan Plazuelo Pascual Carlos Tardon Rubio Juan Toro Cebada Angel Hernando Veciana