https://arxiv.org/api/AOMrU2ltHj6Vu2MO+ZmA1rK/qIE2026-03-16T06:40:50Z4136015http://arxiv.org/abs/2603.12040v1Entropic signatures of market response under concentrated policy communication2026-03-12T15:16:03ZThe first 100 days of Donald Trump second presidential term (January 20th - April 30th, 2025) featured policy actions with potential market repercussions, constituting a well-suited case study of a concentrated policy scenario. Here, we provide a first look at this period, rooted in the information theory, by analyzing major stock indices across the Americas, Europe as well as Asia and Oceania. Our approach jointly examines dispersion (standard deviation) and information complexity (entropy), but also employs a sliding window cumulative entropy to localize extreme events. We find a notable decoupling between the first two measures, indicating that entropy is not merely a proxy for amplitude but reflects the diversity of populated outcomes. As such, they allow us to capture both market volatility and narrative constraints, signaling large and coherent moves driven by policy changes. In turn, the cumulative entropy is found to notably increase during regional episodes with high information density, providing effective signatures of such events. We argue that the obtained results indicate short-term globally coupled, yet regionally modulated, market impacts with clear connection to introduced policies. In what follows, the presented entropic framework emerges as an efficient complement to standard methods for characterizing markets under turbulent conditions, with potential to enhance forecasting strategies such as the stochastic modeling.2026-03-12T15:16:03Z20 pages, 11 figuresEwa A. Drzazga-SzczȩśniakRishabh GuptaAdam Z. KaczmarekJakub T. GnypMarcin W. JarosikRóża WaligóraMarta KielakShivam GuptaAgata GurzyńskaJohann GilPiotr SzczepanikJózefa KielakDominik Szczȩśniakhttp://arxiv.org/abs/2603.11408v1Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction2026-03-12T00:42:08ZForecasting crude oil prices remains challenging because market-relevant information is embedded in large volumes of unstructured news and is not fully captured by traditional polarity-based sentiment measures. This paper examines whether multi-dimensional sentiment signals extracted by large language models improve the prediction of weekly WTI crude oil futures returns. Using energy-sector news articles from 2020 to 2025, we construct five sentiment dimensions covering relevance, polarity, intensity, uncertainty, and forwardness based on GPT-4o, Llama 3.2-3b, and two benchmark models, FinBERT and AlphaVantage. We aggregate article-level signals to the weekly level and evaluate their predictive performance in a classification framework. The best results are achieved by combining GPT-4o and FinBERT, suggesting that LLM-based and conventional financial sentiment models provide complementary predictive information. SHAP analysis further shows that intensity- and uncertainty-related features are among the most important predictors, indicating that the predictive value of news sentiment extends beyond simple polarity. Overall, the results suggest that multi-dimensional LLM-based sentiment measures can improve commodity return forecasting and support energy-market risk monitoring.2026-03-12T00:42:08Z28 pages, 4 figures, 4 tablesDehao DaiDing MaDou LiuKerui GengYiqing Wanghttp://arxiv.org/abs/2603.10272v1An operator-level ARCH Model2026-03-10T23:04:20ZAutoRegressive Conditional Heteroscedasticity (ARCH) models are standard for modeling time series exhibiting volatility, with a rich literature in univariate and multivariate settings. In recent years, these models have been extended to function spaces. However, functional ARCH and generalized ARCH (GARCH) processes established in the literature have thus far been restricted to model ``pointwise'' variances. In this paper, we propose a new ARCH framework for data residing in general separable Hilbert spaces that accounts for the full evolution of the conditional covariance operator. We define a general operator-level ARCH model. For a simplified Constant Conditional Correlation version of the model, we establish conditions under which such models admit strictly and weakly stationary solutions, finite moments, and weak serial dependence. Additionally, we derive consistent Yule--Walker-type estimators of the infinite-dimensional model parameters. The practical relevance of the model is illustrated through simulations and a data application to high-frequency cumulative intraday returns.2026-03-10T23:04:20Z48 pages, 8 Figures, 2 TablesAlexander AueSebastian KühnertGregory RiceJeremy VanderDoeshttp://arxiv.org/abs/2603.10202v1Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion2026-03-10T20:06:53ZGenerating synthetic financial time series that preserve statistical properties of real market data is essential for stress testing, risk model validation, and scenario design. Existing approaches, from parametric models to deep generative networks, struggle to simultaneously reproduce heavy-tailed distributions, negligible linear autocorrelation, and persistent volatility clustering. We propose a hybrid hidden Markov framework that discretizes continuous excess growth rates into Laplace quantile-defined market states and augments regime switching with a Poisson-driven jump-duration mechanism to enforce realistic tail-state dwell times. Parameters are estimated by direct transition counting, bypassing the Baum-Welch EM algorithm. Synthetic data quality is evaluated using Kolmogorov-Smirnov and Anderson-Darling pass rates for distributional fidelity, and ACF mean absolute error for temporal structure. Applied to ten years of SPY data across 1,000 simulated paths, the framework achieves KS and AD pass rates exceeding 97% and 91% in-sample and 94% out-of-sample (calendar year 2025), partially reproducing the ARCH effect that standard regime-switching models miss. No single model dominates all quality dimensions: GARCH(1,1) reproduces volatility clustering more accurately but fails distributional tests (5.5% KS pass rate), while the standard HMM without jumps achieves higher distributional fidelity but cannot generate persistent high-volatility regimes. The proposed framework offers the best joint quality profile across distributional, temporal, and tail-coverage metrics. A Single-Index Model extension propagates the SPY factor path to a 424-asset universe, enabling scalable correlated synthetic path generation while preserving cross-sectional correlation structure.2026-03-10T20:06:53ZAbdulrahman AlswaidanJeffrey D. Varnerhttp://arxiv.org/abs/2602.00086v3Impact of LLMs news Sentiment Analysis on Stock Price Movement Prediction2026-03-09T10:42:38ZThis paper addresses stock price movement prediction by leveraging LLM-based news sentiment analysis. Earlier works have largely focused on proposing and assessing sentiment analysis models and stock movement prediction methods, however, separately. Although promising results have been achieved, a clear and in-depth understanding of the benefit of the news sentiment to this task, as well as a comprehensive assessment of different architecture types in this context, is still lacking. Herein, we conduct an evaluation study that compares 3 different LLMs, namely, DeBERTa, RoBERTa and FinBERT, for sentiment-driven stock prediction. Our results suggest that DeBERTa outperforms the other two models with an accuracy of 75% and that an ensemble model that combines the three models can increase the accuracy to about 80%. Also, we see that sentiment news features can benefit (slightly) some stock market prediction models, i.e., LSTM-, PatchTST- and tPatchGNN-based classifiers and PatchTST- and TimesNet-based regression tasks models.2026-01-22T23:33:02ZICLR 2026 Workshop on Advances in Financial AI (AFA)Walid SialaSnT, University of Luxembourg, LuxembourgAhmed KhanfirRIADI, ENSI, University of Manouba, TunisiaSnT, University of Luxembourg, LuxembourgMike PapadakisSnT, University of Luxembourg, Luxembourghttp://arxiv.org/abs/2602.00037v2Bitcoin Price Prediction using Machine Learning and Combinatorial Fusion Analysis2026-03-08T20:45:43ZIn this work, we propose to apply a new model fusion and learning paradigm, known as Combinatorial Fusion Analysis (CFA), to the field of Bitcoin price prediction. Price prediction of financial product has always been a big topic in finance, as the successful prediction of the price can yield significant profit. Every machine learning model has its own strength and weakness, which hinders progress toward robustness. CFA has been used to enhance models by leveraging rank-score characteristic (RSC) function and cognitive diversity in the combination of a moderate set of diverse and relatively well-performed models. Our method utilizes both score and rank combinations as well as other weighted combination techniques. Key metrics such as RMSE and MAPE are used to evaluate our methodology performance. Our proposal presents a notable MAPE performance of 0.19\%. The proposed method greatly improves upon individual model performance, as well as outperforms other Bitcoin price prediction models.2026-01-19T02:41:43Z8 pages, 5 figures, 3 tables; Accepted to 2025 IEEE Conference on Artificial Intelligence (IEEE CAI)Yuanhong WuWei YeJingyan XuD. Frank Hsuhttp://arxiv.org/abs/2212.01048v3Empirical Asset Pricing via Ensemble Gaussian Process Regression2026-03-07T14:47:40ZWe introduce an ensemble learning method based on Gaussian Process Regression (GPR) for predicting conditional expected stock returns given stock-level and macro-economic information. Our ensemble learning approach significantly reduces the computational complexity inherent in GPR inference and lends itself to general online learning tasks. We conduct an empirical analysis on a large cross-section of US stocks from 1962 to 2016. We find that our method dominates existing machine learning models statistically and economically in terms of out-of-sample $R$-squared and Sharpe ratio of prediction-sorted portfolios. Exploiting the Bayesian nature of GPR, we introduce the mean-variance optimal portfolio with respect to the prediction uncertainty distribution of the expected stock returns. It appeals to an uncertainty averse investor and significantly dominates the equal- and value-weighted prediction-sorted portfolios, which outperform the S&P 500.2022-12-02T09:37:29ZDamir FilipovićPuneet Pasrichahttp://arxiv.org/abs/2603.05917v1Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis2026-03-06T05:15:22ZStock market prediction presents considerable challenges for investors, financial institutions, and policymakers operating in complex market environments characterized by noise, non-stationarity, and behavioral dynamics. Traditional forecasting methods often fail to capture the intricate patterns and cross-sectional dependencies inherent in financial markets. This paper presents an integrated framework combining a node transformer architecture with BERT-based sentiment analysis for stock price forecasting. The proposed model represents the stock market as a graph structure where individual stocks form nodes and edges capture relationships including sectoral affiliations, correlated price movements, and supply chain connections. A fine-tuned BERT model extracts sentiment from social media posts and combines it with quantitative market features through attention-based fusion. The node transformer processes historical market data while capturing both temporal evolution and cross-sectional dependencies among stocks. Experiments on 20 S&P 500 stocks spanning January 1982 to March 2025 demonstrate that the integrated model achieves a mean absolute percentage error (MAPE) of 0.80% for one-day-ahead predictions, compared to 1.20% for ARIMA and 1.00% for LSTM. Sentiment analysis reduces prediction error by 10% overall and 25% during earnings announcements, while graph-based modeling contributes an additional 15% improvement by capturing inter-stock dependencies. Directional accuracy reaches 65% for one-day forecasts. Statistical validation through paired t-tests confirms these improvements (p < 0.05 for all comparisons). The model maintains MAPE below 1.5% during high-volatility periods where baseline models exceed 2%.2026-03-06T05:15:22Z14 pages, 5 figures, 10 tables, submitted to IEEE AccessMohammad Al RidhawiMahtab Haj AliHussein Al Osmanhttp://arxiv.org/abs/2603.05260v1Extreme Value Analysis for Finite, Multivariate and Correlated Systems with Finance as an Example2026-03-05T15:10:54ZExtreme values and the tail behavior of probability distributions are essential for quantifying and mitigating risk in complex systems of all kinds. In multivariate settings, accounting for correlations is crucial. Although extreme value analysis for infinite correlated systems remains an open challenge, we propose a practical framework for handling a large but finite number of correlated time series. We develop our approach for finance as a concrete example but emphasize its generality. We study the extremal behavior of high-frequency stock returns after rotating them into the eigenbasis of the correlation matrix. This separates and extracts various collective effects, including information on the correlated market as a whole and on correlated sectoral behavior from idiosyncratic features, while allowing us to use univariate tools of extreme value analysis. This holds even for high-frequency data where discretization effects normally complicate analysis. We employ a peaks-over-threshold approach and thereby fully avoid the analysis of block maxima. We estimate the tail shape of the rotated returns while explicitly accounting for nonstationarity, a key feature in finance and many other complex systems. Our framework facilitates tail risk estimation relative to larger trends and intraday seasonalities at both market and sectoral levels.2026-03-05T15:10:54ZBenjamin KöhlerAnton J. HeckensThomas Guhrhttp://arxiv.org/abs/2506.08762v2EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements2026-03-05T14:11:48ZLarge Language Models (LLMs) have made remarkable progress, surpassing human performance on several benchmarks in domains such as mathematics and coding. A key driver of this progress has been the development of benchmark datasets. In contrast, the financial domain poses higher entry barriers due to its demand for specialized expertise, and benchmarks remain relatively scarce compared to those in mathematics or coding. We introduce EDINET-Bench, an open-source Japanese financial benchmark designed to evaluate LLMs on challenging tasks such as accounting fraud detection, earnings forecasting, and industry classification. EDINET-Bench is constructed from ten years of annual reports filed by Japanese companies. These tasks require models to process entire annual reports and integrate information across multiple tables and textual sections, demanding expert-level reasoning that is challenging even for human professionals. Our experiments show that even state-of-the-art LLMs struggle in this domain, performing only marginally better than logistic regression in binary classification tasks such as fraud detection and earnings forecasting. Our results show that simply providing reports to LLMs in a straightforward setting is not enough. This highlights the need for benchmark frameworks that better reflect the environments in which financial professionals operate, with richer scaffolding such as realistic simulations and task-specific reasoning support to enable more effective problem solving. We make our dataset and code publicly available to support future research.2025-06-10T13:03:36ZAccepted to ICLR 2026Issa SugiuraTakashi IshidaTaro MakinoChieko TazukeTakanori NakagawaKosuke NakagoDavid Hahttp://arxiv.org/abs/2603.05119v1Asymptotic Separability of Diffusion and Jump Components in High-Frequency CIR and CKLS Models2026-03-05T12:44:58ZThis paper develops a robust parametric framework for jump detection in discretely observed CKLS-type jump-diffusion processes with high-frequency asymptotics, based on the minimum density power divergence estimator (MDPDE). The methodology exploits the intrinsic asymptotic scale separation between diffusion increments, which decay at rate $\sqrt{Δ_n}$, and jump increments, which remain of non-vanishing stochastic magnitude. Using robust MDPDE-based estimators of the drift and diffusion coefficients, we construct standardized residuals whose extremal behavior provides a principled basis for statistical discrimination between continuous and discontinuous components. We establish that, over diffusion intervals, the maximum of the normalized residuals converges to the Gumbel extreme-value distribution, yielding an explicit and asymptotically valid detection threshold. Building on this result, we prove classification consistency of the proposed robust detection procedure: the probability of correctly identifying all jump and diffusion increments converges to one under proper asymptotics. The MDPDE-based normalization attenuates the influence of atypical increments and stabilizes the detection boundary in the presence of discontinuities. Simulation results confirm that robustness improves finite-sample stability and reduces spurious detections without compromising asymptotic validity. The proposed methodology provides a theoretically rigorous and practically resilient robust approach to jump identification in high-frequency stochastic systems.2026-03-05T12:44:58ZSourojyoti Barickhttp://arxiv.org/abs/2508.11372v2Stealing Accuracy: Predicting Day-ahead Electricity Prices with Temporal Hierarchy Forecasting (THieF)2026-03-04T22:00:04ZWe introduce the concept of temporal hierarchy forecasting (THieF) in predicting day-ahead electricity prices and show that reconciling forecasts for hourly products and 2- to 24-hour blocks can significantly (up to 13%) improve accuracy at all levels. These results remain consistent throughout a challenging 4-year test period (2021-2024) in the German and Spanish power markets and across model architectures, including linear regression, shallow feedforward neural networks, gradient-boosted decision trees, and a state-of-the-art, pretrained transformer. Given that (i) trading of block products is becoming more common and (ii) the computational cost of reconciliation is comparable to that of predicting hourly prices alone, we recommend using it in daily forecasting practice.2025-08-15T10:13:51Z18 pagesArkadiusz LipieckiKaja BilinskaNicolaos KourentzesRafal Weronhttp://arxiv.org/abs/2506.07711v6The Subtle Interplay between Square-root Impact, Order Imbalance & Volatility: A Unifying Framework2026-03-04T09:31:44ZIn this work, we aim to reconcile several apparently contradictory observations in market microstructure: is the famous "square-root law" of metaorder impact, which decays with time, compatible with the random-walk nature of prices and the linear impact of order imbalances? Can one entirely explain the volatility of prices as resulting from the flow of uninformed metaorders that mechanically impact them? We introduce a new theoretical framework to describe metaorders with different signs, sizes and durations, which all impact prices as a square-root of volume but with a subsequent time decay. We show that, as in the original propagator model, price diffusion is ensured by the long memory of cross-correlations between metaorders. In order to account for the effect of strongly fluctuating volumes q of individual trades, we further introduce two q-dependent exponents, which allow us to describe how the moments of generalized volume imbalance and the correlation between price changes and generalized order flow imbalance scale with T. We predict in particular that the corresponding power-laws depend in a non-monotonic fashion on a parameter a, which allows one to put the same weight on all child orders or to overweight large ones, a behaviour that is clearly borne out by empirical data. We also predict that the correlation between price changes and volume imbalances should display a maximum as a function of a, which again matches observations. Such noteworthy agreement between theory and data suggests that our framework correctly captures the basic mechanism at the heart of price formation, namely the average impact of metaorders. We argue that our results support the "Order-Driven" theory of excess volatility, and are at odds with the idea that a "Fundamental" component accounts for a large share of the volatility of financial markets.2025-06-09T12:53:25ZGuillaume MaitrierJean-Philippe Bouchaudhttp://arxiv.org/abs/2508.16378v2Sentiment-Aware Mean-Variance Portfolio Optimization for Cryptocurrencies2026-03-04T03:28:45ZCryptocurrency markets are highly volatile and influenced by both price trends and market sentiment, making effective portfolio management challenging. This paper proposes a dynamic cryptocurrency portfolio strategy that integrates technical indicators and sentiment analysis to enhance investment decision-making. Market momentum is captured using the 14-day Relative Strength Index (RSI) and Simple Moving Average (SMA), while sentiment signals are extracted from news articles with VADER and further validated using the Google Gemini large language model. These signals are incorporated into expected return estimates and used in a constrained mean-variance optimization framework. Backtesting across multiple cryptocurrencies shows that the integrated approach outperforms traditional benchmarks, including momentum strategy, Bitcoin Long-Short strategy, and an equal-weighted portfolio, achieving stronger risk-adjusted returns and more consistent cumulative growth. Furthermore, comparing the sentiment-only and technical-only strategies shows that incorporating sentiment information alongside technical indicators can lead to more consistent performance gains. However, the strategies exhibit substantial drawdowns that coincide with known periods of market stress, indicating that additional risk-management components are required to improve stability.2025-08-22T13:34:09ZThis paper has been accepted by the journal Digital FinanceQizhao Chenhttp://arxiv.org/abs/2603.02898v1Range-Based Volatility Estimators for Monitoring Market Stress: Evidence from Local Food Price Data2026-03-03T11:47:45ZRange-based volatility estimators are widely used in financial econometrics to quantify risk and market stress, yet their application to local commodity markets remains limited. This paper shows how open-high--low-close (OHLC) volatility estimators can be adapted to monitor localized market distress across diverse development contexts, including conflict-affected settings, climate-exposed regions, remote and thinly traded markets, and import- and logistics-constrained urban hubs. Using monthly food price data from the World Bank's Real-Time Prices dataset, several volatility measures -- including the Parkinson, Garman-Klass, Rogers-Satchell, and Yang-Zhang estimators -- are constructed and evaluated against independently documented disruption timelines. Across settings, elevated volatility aligns with episodes linked to insecurity and market fragmentation, extreme weather and disaster shocks, policy and fuel-cost adjustments, and global supply-chain and trade disruptions. Volatility also detects stress that standard momentum indicators such as the relative strength index (RSI) can miss, including symmetric or rapidly reversing shocks in which offsetting supply and demand disturbances dampen net directional price movements while amplifying intra-period dispersion. Overall, OHLC-based volatility indicators provide a robust and interpretable signal of market disruptions and complement price-level monitoring for applications spanning financial risk, humanitarian early warning, and trade.2026-03-03T11:47:45Z41 pages, 10 figures, 11 tablesBo Pieter Johannes Andrée