https://arxiv.org/api/Wh/ei9x+unrmV/HW8jstD/fBx/g2026-06-14T11:19:33Z225933015http://arxiv.org/abs/2507.09739v1Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 5002025-07-13T18:30:57ZThis study integrates real-time sentiment analysis from financial news, GPT-2 and FinBERT, with technical indicators and time-series models like ARIMA and ETS to optimize S&P 500 trading strategies. By merging sentiment data with momentum and trend-based metrics, including a benchmark buy-and-hold and sentiment-based approach, is evaluated through assets values and returns. Results show that combining sentiment-driven insights with traditional models improves trading performance, offering a more dynamic approach to stock trading that adapts to market changes in volatile environments.2025-07-13T18:30:57ZHaojie LiuZihan LinRandall R. Rojashttp://arxiv.org/abs/2507.09734v1Boltzmann Price: Toward Understanding the Fair Price in High-Frequency Markets2025-07-13T18:17:48ZIn this paper, we introduce a parametrized family of prices derived from the Maximum Entropy Principle. The price is obtained from the distribution that minimizes bias, given the bid and ask volume imbalance at the top of the order book. Under specific parameter choices, it closely approximates the mid-price or the weighted mid-price. Using probabilities of bid and ask states, we propose a model of price dynamics in which both drift and volatility are driven by volume imbalance. Compared to standard models like Bachelier or Geometric Brownian Motion with constant volatility, our model can generate higher kurtosis and heavy-tailed distributions. Additionally, the drift term naturally emerges as a consequence of the order book imbalance. We validate the model through simulation and demonstrate its fit to historical equity data. The model provides a theoretical framework, integrating price, volume imbalance, and spread.2025-07-13T18:17:48ZPrzemysław Rolahttp://arxiv.org/abs/2507.04481v1Does Overnight News Explain Overnight Returns?2025-07-06T17:37:50ZOver the past 30 years, nearly all the gains in the U.S. stock market have been earned overnight, while average intraday returns have been negative or flat. We find that a large part of this effect can be explained through features of intraday and overnight news. Our analysis uses a collection of 2.4 million news articles. We apply a novel technique for supervised topic analysis that selects news topics based on their ability to explain contemporaneous market returns. We find that time variation in the prevalence of news topics and differences in the responses to news topics both contribute to the difference in intraday and overnight returns. In out-of-sample tests, our approach forecasts which stocks will do particularly well overnight and particularly poorly intraday. Our approach also helps explain patterns of continuation and reversal in intraday and overnight returns. We contrast the effect of news with other mechanisms proposed in the literature to explain overnight returns.2025-07-06T17:37:50ZPaul GlassermanKriste KrstovskiPaul LaliberteHarry Mamayskyhttp://arxiv.org/abs/2502.21206v3Chronologically Consistent Large Language Models2025-07-06T01:02:58ZLarge language models are increasingly used in social sciences, but their training data can introduce lookahead bias and training leakage. A good chronologically consistent language model requires efficient use of training data to maintain accuracy despite time-restricted data. Here, we overcome this challenge by training a suite of chronologically consistent large language models, ChronoBERT and ChronoGPT, which incorporate only the text data that would have been available at each point in time. Despite this strict temporal constraint, our models achieve strong performance on natural language processing benchmarks, outperforming or matching widely used models (e.g., BERT), and remain competitive with larger open-weight models. Lookahead bias is model and application-specific because even if a chronologically consistent language model has poorer language comprehension, a regression or prediction model applied on top of the language model can compensate. In an asset pricing application predicting next-day stock returns from financial news, we find that ChronoBERT and ChronoGPT's real-time outputs achieve Sharpe ratios comparable to a much larger Llama model, indicating that lookahead bias is modest. Our results demonstrate a scalable, practical framework to mitigate training leakage, ensuring more credible backtests and predictions across finance and other social science domains.2025-02-28T16:25:50ZSongrun HeLinying LvAsaf ManelaJimmy Wuhttp://arxiv.org/abs/2506.02869v2Optimal Dynamic Fees in Automated Market Makers2025-06-24T20:44:53ZAutomated Market Makers (AMMs) are emerging as a popular decentralised trading platform. In this work, we determine the optimal dynamic fees in a constant function market maker. We find approximate closed-form solutions to the control problem and study the optimal fee structure. We find that there are two distinct fee regimes: one in which the AMM imposes higher fees to deter arbitrageurs, and another where fees are lowered to increase volatility and attract noise traders. Our results also show that dynamic fees that are linear in inventory and are sensitive to changes in the external price are a good approximation of the optimal fee structure and thus constitute suitable candidates when designing fees for AMMs.2025-06-03T13:34:28Z18 pagesLeonardo BaggianiMartin HerdegenLeandro Sánchez-Betancourthttp://arxiv.org/abs/2412.10823v2FinGPT: Enhancing Sentiment-Based Stock Movement Prediction with Dissemination-Aware and Context-Enriched LLMs2025-06-22T14:11:04ZFinancial sentiment analysis is crucial for understanding the influence of news on stock prices. Recently, large language models (LLMs) have been widely adopted for this purpose due to their advanced text analysis capabilities. However, these models often only consider the news content itself, ignoring its dissemination, which hampers accurate prediction of short-term stock movements. Additionally, current methods often lack sufficient contextual data and explicit instructions in their prompts, limiting LLMs' ability to interpret news. In this paper, we propose a data-driven approach that enhances LLM-powered sentiment-based stock movement predictions by incorporating news dissemination breadth, contextual data, and explicit instructions. We cluster recent company-related news to assess its reach and influence, enriching prompts with more specific data and precise instructions. This data is used to construct an instruction tuning dataset to fine-tune an LLM for predicting short-term stock price movements. Our experimental results show that our approach improves prediction accuracy by 8\% compared to existing methods.2024-12-14T13:04:42Z1st Workshop on Preparing Good Data for Generative AI: Challenges and Approaches@ AAAI 2025, ai4finance.orgYixuan LiangYuncong LiuNeng WangHongyang YangBoyu ZhangChristina Dan Wanghttp://arxiv.org/abs/2307.03499v3Decentralised Finance and Automated Market Making: Execution and Speculation2025-06-17T21:18:33ZAutomated market makers (AMMs) are a new prototype of decentralised exchanges which are revolutionising market interactions. The majority of AMMs are constant product markets (CPMs) where exchange rates are set by a trading function. This work studies optimal trading and statistical arbitrage in CPMs where balancing exchange rate risk and execution costs is key. Empirical evidence shows that execution costs are accurately estimated by the convexity of the trading function. These convexity costs are linear in the trade size and are nonlinear in the depth of liquidity and in the exchange rate. We develop models for when exchange rates form in a competing centralised exchange, in a CPM, or in both venues. Finally, we derive computationally efficient strategies that account for stochastic convexity costs and we showcase their out-of-sample performance.2023-07-07T10:25:59ZForthcoming in Journal of Economic Dynamics and ControlÁlvaro CarteaFayçal DrissiMarcello Mongahttp://arxiv.org/abs/2411.13993v2Market Making without Regret2025-06-17T07:56:29ZWe consider a sequential decision-making setting where, at every round $t$, a market maker posts a bid price $B_t$ and an ask price $A_t$ to an incoming trader (the taker) with a private valuation for one unit of some asset. If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs. If a trade happens at round $t$, then letting $M_t$ be the market price (observed only at the end of round $t$), the maker's utility is $M_t - B_t$ if the maker bought the asset, and $A_t - M_t$ if they sold it. We characterize the maker's regret with respect to the best fixed choice of bid and ask pairs under a variety of assumptions (adversarial, i.i.d., and their variants) on the sequence of market prices and valuations. Our upper bound analysis unveils an intriguing connection relating market making to first-price auctions and dynamic pricing. Our main technical contribution is a lower bound for the i.i.d. case with Lipschitz distributions and independence between prices and valuations. The difficulty in the analysis stems from the unique structure of the reward and feedback functions, allowing an algorithm to acquire information by graduating the "cost of exploration" in an arbitrary way.2024-11-21T10:13:55ZNicolò Cesa-BianchiTommaso CesariRoberto ColomboniLuigi FoscariVinayak Pathakhttp://arxiv.org/abs/2502.09172v2LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data2025-06-16T14:02:28ZWhile financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains "market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.2025-02-13T10:56:58ZProceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025Peer NagySascha FreyKang LiBidipta SarkarSvitlana VyetrenkoStefan ZohrenAni CalinescuJakob Foersterhttp://arxiv.org/abs/2506.12281v1A New Approach for the Continuous Time Kyle-Back Strategic Insider Equilibrium Problem2025-06-14T00:06:16ZThis paper considers a continuous time Kyle-Back model which is a game problem between an insider and a market marker. The existing literature typically focuses on the existence of equilibrium by using the PDE approach, which requires certain Markovian structure and the equilibrium is in the bridge form. We shall provide a new approach which is used widely for stochastic controls and stochastic differential games. We characterize all equilibria through a coupled system of forward backward SDEs, where the forward one is the conditional law of the inside information and the backward one is the insider's optimal value. In particular, when the time duration is small, we show that the FBSDE is wellposed and thus the game has a unique equilibrium. This is the first uniqueness result in the literature, without restricting the equilibria to certain special structure. Moreover, this unique equilibrium may not be Markovian, indicating that the PDE approach cannot work in this case. We next study the set value of the game, which roughly speaking is the set of insider's values over all equilibria and thus is by nature unique. We show that, although the bridge type of equilibria in the literature does not satisfy the required integrability for our equilibria, its truncation serves as a desired approximate equilibrium and its value belongs to our set value. Finally, we characterize our set value through a level set of certain standard HJB equation.2025-06-14T00:06:16ZBixing QiaoJianfeng Zhanghttp://arxiv.org/abs/2506.11921v1Dynamic Grid Trading Strategy: From Zero Expectation to Market Outperformance2025-06-13T16:11:44ZWe propose a profitable trading strategy for the cryptocurrency market based on grid trading. Starting with an analysis of the expected value of the traditional grid strategy, we show that under simple assumptions, its expected return is essentially zero. We then introduce a novel Dynamic Grid-based Trading (DGT) strategy that adapts to market conditions by dynamically resetting grid positions. Our backtesting results using minute-level data from Bitcoin and Ethereum between January 2021 and July 2024 demonstrate that the DGT strategy significantly outperforms both the traditional grid and buy-and-hold strategies in terms of internal rate of return and risk control.2025-06-13T16:11:44Z7 pages, 8 figures. Code available at https://github.com/colachenkc/Dynamic-Grid-TradingKai-Yuan ChenKai-Hsin ChenJyh-Shing Roger Janghttp://arxiv.org/abs/2506.11843v1Multi-dimensional queue-reactive model and signal-driven models: a unified framework2025-06-13T14:45:51ZWe present a Markovian market model driven by a hidden Brownian efficient price. In particular, we extend the queue-reactive model, making its dynamics dependent on the efficient price. Our study focuses on two sub-models: a signal-driven price model where the mid-price jump rates depend on the efficient price and an observable signal, and the usual queue-reactive model dependent on the efficient price via the intensities of the order arrivals. This way, we are able to correlate the evolution of limit order books of different stocks. We prove the stability of the observed mid-price around the efficient price under natural assumptions. Precisely, we show that at the macroscopic scale, prices behave as diffusions. We also develop a maximum likelihood estimation procedure for the model, and test it numerically. Our model is them used to backest trading strategies in a liquidation context.2025-06-13T14:45:51Z33 pagesEmmanouil Sfendourakishttp://arxiv.org/abs/2503.06928v2FinTSBridge: A New Evaluation Suite for Real-world Financial Prediction with Advanced Time Series Models2025-06-11T15:26:46ZDespite the growing attention to time series forecasting in recent years, many studies have proposed various solutions to address the challenges encountered in time series prediction, aiming to improve forecasting performance. However, effectively applying these time series forecasting models to the field of financial asset pricing remains a challenging issue. There is still a need for a bridge to connect cutting-edge time series forecasting models with financial asset pricing. To bridge this gap, we have undertaken the following efforts: 1) We constructed three datasets from the financial domain; 2) We selected over ten time series forecasting models from recent studies and validated their performance in financial time series; 3) We developed new metrics, msIC and msIR, in addition to MSE and MAE, to showcase the time series correlation captured by the models; 4) We designed financial-specific tasks for these three datasets and assessed the practical performance and application potential of these forecasting models in important financial problems. We hope the developed new evaluation suite, FinTSBridge, can provide valuable insights into the effectiveness and robustness of advanced forecasting models in finanical domains.2025-03-10T05:19:13ZICLR 2025 Workshop Advances in Financial AIYanlong WangJian XuTiantian GaoHongkang ZhangShao-Lun HuangDanny Dongning SunXiao-Ping Zhanghttp://arxiv.org/abs/2506.08992v1Optimal hedging of an informed broker facing many traders2025-06-10T17:04:39ZThis paper investigates the optimal hedging strategies of an informed broker interacting with multiple traders in a financial market. We develop a theoretical framework in which the broker, possessing exclusive information about the drift of the asset's price, engages with traders whose trading activities impact the market price. Using a mean-field game approach, we derive the equilibrium strategies for both the broker and the traders, illustrating the intricate dynamics of their interactions. The broker's optimal strategy involves a Stackelberg equilibrium, where the broker leads and the traders follow. Our analysis also addresses the mean field limit of finite-player models and shows the convergence to the mean-field solution as the number of traders becomes large.2025-06-10T17:04:39ZPhilippe BergaultPierre CardaliaguetWenbin Yanhttp://arxiv.org/abs/2506.08718v1Price Discovery in Cryptocurrency Markets2025-06-10T12:07:11ZThis document analyzes price discovery in cryptocurrency markets by comparing centralized and decentralized exchanges, as well as spot and futures markets. The study focuses first on Ethereum (ETH) and then applies a similar approach to Bitcoin (BTC). Chapter 1 outlines the theoretical framework, emphasizing the structural differences between centralized exchanges and decentralized finance mechanisms, especially Automated Market Makers (AMMs). It also explains how to construct an order book from a liquidity pool in a decentralized setting for comparison with centralized exchanges. Chapter 2 describes the methodological tools used: Hasbrouck's Information Share, Gonzalo and Granger's Permanent-Transitory decomposition, and the Hayashi-Yoshida estimator. These are applied to explore lead-lag dynamics, cointegration, and price discovery across market types. Chapter 3 presents the empirical analysis. For ETH, it compares price dynamics on Binance and Uniswap v2 over a one-year period, focusing on five key events in 2024. For BTC, it analyzes the relationship between spot and futures prices on the CME. The study estimates lead-lag effects and cointegration in both cases. Results show that centralized markets typically lead in ETH price discovery. In futures markets, while they tend to lead overall, high-volatility periods produce mixed outcomes. The findings have key implications for traders and institutions regarding liquidity, arbitrage, and market efficiency. Various metrics are used to benchmark the performance of modified AMMs and to understand the interaction between decentralized and centralized structures.2025-06-10T12:07:11ZJuan Plazuelo PascualCarlos Tardon RubioJuan Toro CebadaAngel Hernando Veciana