https://arxiv.org/api/yRkFSmboH18JB/T+vBhFAYdiCGA 2026-03-22T09:04:33Z 3124 105 15 http://arxiv.org/abs/2507.22936v2 Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis 2026-01-19T21:50:29Z Large language models (LLMs) are increasingly used to support the analysis of complex financial disclosures, yet their reliability, behavioral consistency, and transparency remain insufficiently understood in high-stakes settings. This paper presents a controlled evaluation of five transformer-based LLMs applied to question answering over the Business sections of U.S. 10-K filings. To capture complementary aspects of model behavior, we combine human evaluation, automated similarity metrics, and behavioral diagnostics under standardized and context-controlled prompting conditions. Human assessments indicate that models differ in their average performance across qualitative dimensions such as relevance, completeness, clarity, conciseness, and factual accuracy, though inter-rater agreement is modest, reflecting the subjective nature of these criteria. Automated metrics reveal systematic differences in lexical overlap and semantic similarity across models, while behavioral diagnostics highlight variation in response stability and cross-prompt alignment. Importantly, no single model consistently dominates across all evaluation perspectives. Together, these findings suggest that apparent performance differences should be interpreted as relative tendencies under the tested conditions rather than definitive indicators of general reliability. The results underscore the need for evaluation frameworks that account for human disagreement, behavioral variability, and interpretability when deploying LLMs in financially consequential applications. 2025-07-24T20:10:27Z 23 Pages Md Talha Mohsin http://arxiv.org/abs/2512.12727v2 EXFormer: A Multi-Scale Trend-Aware Transformer with Dynamic Variable Selection for Foreign Exchange Returns Prediction 2026-01-19T18:39:04Z Accurately forecasting daily exchange rate returns represents a longstanding challenge in international finance, as the exchange rate returns are driven by a multitude of correlated market factors and exhibit high-frequency fluctuations. This paper proposes EXFormer, a novel Transformer-based architecture specifically designed for forecasting the daily exchange rate returns. We introduce a multi-scale trend-aware self-attention mechanism that employs parallel convolutional branches with differing receptive fields to align observations on the basis of local slopes, preserving long-range dependencies while remaining sensitive to regime shifts. A dynamic variable selector assigns time-varying importance weights to 28 exogenous covariates related to exchange rate returns, providing pre-hoc interpretability. An embedded squeeze-and-excitation block recalibrates channel responses to emphasize informative features and depress noise in the forecasting. Using the daily data for EUR/USD, USD/JPY, and GBP/USD, we conduct out-of-sample evaluations across five different sliding windows. EXFormer consistently outperforms the random walk and other baselines, improving directional accuracy by a statistically significant margin of up to 8.5--22.8%. In nearly one year of trading backtests, the model converts these gains into cumulative returns of 18%, 25%, and 18% for the three pairs, with Sharpe ratios exceeding 1.8. When conservative transaction costs and slippage are accounted for, EXFormer retains cumulative returns of 7%, 19%, and 9%, while other baselines achieve negative. The robustness checks further confirm the model's superiority under high-volatility and bear-market regimes. EXFormer furnishes both economically valuable forecasts and transparent, time-varying insights into the drivers of exchange rate dynamics for international investors, corporations, and central bank practitioners. 2025-12-14T15:00:36Z 85 pages, 11 figures Dinggao Liu Robert Ślepaczuk Zhenpeng Tang http://arxiv.org/abs/2601.22168v1 Stablecoin Design with Adversarial-Robust Multi-Agent Systems via Trust-Weighted Signal Aggregation 2026-01-18T14:21:25Z Algorithmic stablecoins promise decentralized monetary stability by maintaining a target peg through programmatic reserve management. Yet, their reserve controllers remain vulnerable to regime-blind optimization, calibrating risk parameters on fair-weather data while ignoring tail events that precipitate cascading failures. The March 2020 Black Thursday collapse, wherein MakerDAO's collateral auctions yielded $8.3M in losses and a 15% peg deviation, exposed a critical gap: existing models like SAS systematically omit extreme volatility regimes from covariance estimates, producing allocations optimal in expectation but catastrophic under adversarial stress. We present MVF-Composer, a trust-weighted Mean-Variance Frontier reserve controller incorporating a novel Stress Harness for risk-state estimation. Our key insight is deploying multi-agent simulations as adversarial stress-testers: heterogeneous agents (traders, liquidity providers, attackers) execute protocol actions under crisis scenarios, exposing reserve vulnerabilities before they manifest on-chain. We formalize a trust-scoring mechanism T: A -> [0,1] that down-weights signals from agents exhibiting manipulative behavior, ensuring the risk-state estimator remains robust to signal injection and Sybil attacks. Across 1,200 randomized scenarios with injected Black-Swan shocks (10% collateral drawdown, 50% sentiment collapse, coordinated redemption attacks), MVF-Composer reduces peak peg deviation by 57% and mean recovery time by 3.1x relative to SAS baselines. Ablation studies confirm the trust layer accounts for 23% of stability gains under adversarial conditions, achieving 72% adversarial agent detection. Our system runs on commodity hardware, requires no on-chain oracles beyond standard price feeds, and provides a reproducible framework for stress-testing DeFi reserve policies. 2026-01-18T14:21:25Z Shengwei You Aditya Joshi Andrey Kuehlkamp Jarek Nabrzyski http://arxiv.org/abs/2601.11201v1 Fast Times, Slow Times: Timescale Separation in Financial Timeseries Data 2026-01-16T11:23:13Z Financial time series exhibit multiscale behavior, with interaction between multiple processes operating on different timescales. This paper introduces a method for separating these processes using variance and tail stationarity criteria, framed as generalized eigenvalue problems. The approach allows for the identification of slow and fast components in asset returns and prices, with applications to parameter drift, mean reversion, and tail risk management. Empirical examples using currencies, equity ETFs and treasury yields illustrate the practical utility of the method. 2026-01-16T11:23:13Z Jan Rosenzweig http://arxiv.org/abs/2601.11097v1 KANHedge: Efficient Hedging of High-Dimensional Options Using Kolmogorov-Arnold Network-Based BSDE Solver 2026-01-16T08:57:17Z High-dimensional option pricing and hedging present significant challenges in quantitative finance, where traditional PDE-based methods struggle with the curse of dimensionality. The BSDE framework offers a computationally efficient alternative to PDE-based methods, and recently proposed deep BSDE solvers, generally utilizing conventional Multi-Layer Perceptrons (MLPs), build upon this framework to provide a scalable alternative to numerical BSDE solvers. In this research, we show that although such MLP-based deep BSDEs demonstrate promising results in option pricing, there remains room for improvement regarding hedging performance. To address this issue, we introduce KANHedge, a novel BSDE-based hedger that leverages Kolmogorov-Arnold Networks (KANs) within the BSDE framework. Unlike conventional MLP approaches that use fixed activation functions, KANs employ learnable B-spline activation functions that provide enhanced function approximation capabilities for continuous derivatives. We comprehensively evaluate KANHedge on both European and American basket options across multiple dimensions and market conditions. Our experimental results demonstrate that while KANHedge and MLP achieve comparable pricing accuracy, KANHedge provides improved hedging performance. Specifically, KANHedge achieves considerable reductions in hedging cost metrics, demonstrating enhanced risk control capabilities. 2026-01-16T08:57:17Z 8 pages Rushikesh Handal Masanori Hirano http://arxiv.org/abs/2601.18804v1 Deep g-Pricing for CSI 300 Index Options with Volatility Trajectories and Market Sentiment 2026-01-15T08:58:09Z Option pricing in real markets faces fundamental challenges. The Black--Scholes--Merton (BSM) model assumes constant volatility and uses a linear generator $g(t,x,y,z)=-ry$, while lacking explicit behavioral factors, resulting in systematic departures from observed dynamics. This paper extends the BSM model by learning a nonlinear generator within a deep Forward--Backward Stochastic Differential Equation (FBSDE) framework. We propose a dual-network architecture where the value network $u_θ$ learns option prices and the generator network $g_φ$ characterizes the pricing mechanism, with the hedging strategy $Z_t=σ_t X_t \nabla_x u_θ$ obtained via automatic differentiation. The framework adopts forward recursion from a learnable initial condition $Y_0=u_θ(0,\cdot)$, naturally accommodating volatility trajectory and sentiment features. Empirical results on CSI 300 index options show that our method reduces Mean Absolute Error (MAE) by 32.2\% and Mean Absolute Percentage Error (MAPE) by 35.3\% compared with BSM. Interpretability analysis indicates that architectural improvements are effective across all option types, while the information advantage is asymmetric between calls and puts. Specifically, call option improvements are primarily driven by sentiment features, whereas put options show more balanced contributions from volatility trajectory and sentiment features. This finding aligns with economic intuition regarding option pricing mechanisms. 2026-01-15T08:58:09Z 25 pages, 6 figures, 10 tables. Submitted to IMA Journal of Management Mathematics Yilun Zhang Zheng Tang Hexiang Sun Yufeng Shi http://arxiv.org/abs/2601.10043v1 Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition 2026-01-15T03:41:00Z Particularly, financial named-entity recognition (NER) is one of the many important approaches to translate unformatted reports and news into structured knowledge graphs. However, free, easy-to-use large language models (LLMs) often fail to differentiate organisations as people, or disregard an actual monetary amount entirely. This paper takes Meta's Llama 3 8B and applies it to financial NER by combining instruction fine-tuning and Low-Rank Adaptation (LoRA). Each annotated sentence is converted into an instruction-input-output triple, enabling the model to learn task descriptions while fine-tuning with small low-rank matrices instead of updating all weights. Using a corpus of 1,693 sentences, our method obtains a micro-F1 score of 0.894 compared with Qwen3-8B, Baichuan2-7B, T5, and BERT-Base. We present dataset statistics, describe training hyperparameters, and perform visualizations of entity density, learning curves, and evaluation metrics. Our results show that instruction tuning combined with parameter-efficient fine-tuning enables state-of-the-art performance on domain-sensitive NER. 2026-01-15T03:41:00Z Zhiming Lian http://arxiv.org/abs/2601.18801v1 Design-Robust Event-Study Estimation under Staggered Adoption Diagnostics, Sensitivity, and Orthogonalisation 2026-01-14T09:04:24Z This paper develops a design-first econometric framework for event-study and difference-in-differences estimands under staggered adoption with heterogeneous effects, emphasising (i) exact probability limits for conventional two-way fixed effects event-study regressions, (ii) computable design diagnostics that quantify contamination and negative-weight risk, and (iii) sensitivity-robust inference that remains uniformly valid under restricted violations of parallel trends. The approach is accompanied by orthogonal score constructions that reduce bias from high-dimensional nuisance estimation when conditioning on covariates. Theoretical results and Monte Carlo experiments jointly deliver a self-contained methodology paper suitable for finance and econometrics applications where timing variation is intrinsic to policy, regulation, and market-structure changes. 2026-01-14T09:04:24Z 71 pages, 9 figures, 9 tables. arXiv submission: full theoretical development; Monte Carlo evidence (Section 8); replicable empirical application to staggered state banking deregulation (Section 9) comparing TWFE event-studies to heterogeneity-robust estimators with diagnostics (weights, pre-trends, placebo) and calibrated sensitivity analysis over (B,Γ,Δ(\mathcal{R})) Craig S Wright http://arxiv.org/abs/2601.09074v1 The Fourier estimator of spot volatility: Unbounded coefficients and jumps in the price process 2026-01-14T02:07:49Z In this paper we study the Fourier estimator of Malliavin and Mancino for the spot volatility. We establish the convergence of the trigonometric polynomial to the volatility's path in a setting that includes the following aspects. First, the volatility is required to satisfy a mild integrability condition, but otherwise allowed to be unbounded. Second, the price process is assumed to have cadlag paths, not necessarily continuous. We obtain convergence rates for the probability of a bad approximation in estimated coefficients, with a speed that allow to obtain an almost sure convergence and not just in probability in the estimated reconstruction of the volatility's path. This is a new result even in the setting of continuous paths. We prove that a rescaled trigonometric polynomial approximate the quadratic jump process. 2026-01-14T02:07:49Z L. J. Espinosa González Erick Treviño Aguilar http://arxiv.org/abs/2505.07676v2 Transfer Learning Across Fixed-Income Product Classes 2026-01-13T12:47:03Z We propose a framework for transfer learning of discount curves across different fixed-income product classes. Motivated by challenges in estimating discount curves from sparse or noisy data, we extend kernel ridge regression (KR) to a vector-valued setting, formulating a convex optimization problem in a vector-valued reproducing kernel Hilbert space (RKHS). Each component of the solution corresponds to the discount curve implied by a specific product class. We introduce an additional regularization term motivated by economic principles, promoting smoothness of spread curves between product classes, and show that it leads to a valid separable kernel structure. A main theoretical contribution is a decomposition of the vector-valued RKHS norm induced by separable kernels. We further provide a Gaussian process interpretation of vector-valued KR, enabling quantification of estimation uncertainty. Illustrative examples show how transfer learning tightens confidence intervals compared to single-curve estimation. An extensive masking experiment demonstrates that transfer learning significantly improves extrapolation performance. 2025-05-12T15:43:29Z Nicolas Camenzind Damir Filipovic http://arxiv.org/abs/2601.07792v1 Non-Convex Portfolio Optimization via Energy-Based Models: A Comparative Analysis Using the Thermodynamic HypergRaphical Model Library (THRML) for Index Tracking 2026-01-12T18:04:33Z Portfolio optimization under cardinality constraints transforms the classical Markowitz mean-variance problem from a convex quadratic problem into an NP-hard combinatorial optimization problem. This paper introduces a novel approach using THRML (Thermodynamic HypergRaphical Model Library), a JAX-based library for building and sampling probabilistic graphical models that reformulates index tracking as probabilistic inference on an Ising Hamiltonian. Unlike traditional methods that seek a single optimal solution, THRML samples from the Boltzmann distribution of high-quality portfolios using GPU-accelerated block Gibbs sampling, providing natural regularization against overfitting. We implement three key innovations: (1) dynamic coupling strength that scales inversely with market volatility (VIX), adapting diversification pressure to market regimes; (2) rebalanced bias weights prioritizing tracking quality over momentum for index replication; and (3) sector-aware post-processing ensuring institutional-grade diversification. Backtesting on a 100-stock S and P 500 universe from 2023 to 2025 demonstrates that THRML achieves 4.31 percent annualized tracking error versus 5.66 to 6.30 percent for baselines, while simultaneously generating 128.63 percent total return against the index total return of 79.61 percent. The Diebold-Mariano test confirms statistical significance with p less than 0.0001 across all comparisons. These results position energy-based models as a promising paradigm for portfolio construction, bridging statistical mechanics and quantitative finance. 2026-01-12T18:04:33Z 10 pages, 5 figures. GPU-accelerated energy-based models for cardinality-constrained index tracking Javier Mancilla Theodoros D. Bouloumis Frederic Goguikian http://arxiv.org/abs/2502.06830v4 OrderFusion: Encoding Orderbook for End-to-End Probabilistic Intraday Electricity Price Forecasting 2026-01-12T09:55:39Z Probabilistic intraday electricity price forecasting is becoming increasingly important with the growth of renewable generation and the rise in demand-side engagement. Their uncertainties have increased the trading risks closer to delivery and the subsequent imbalance settlement costs. As a consequence, intraday trading has emerged to mitigate these risks. Unlike auction markets, intraday trading in many jurisdictions is characterized by the continuous posting of buy and sell orders on power exchange platforms. This dynamic orderbook microstructure of price formation presents special challenges for price forecasting. Conventional methods represent the orderbook via domain features aggregated from buy and sell trades, or by treating it as a multivariate time series, but such representations neglect the full buy-sell interaction structure of the orderbook. This research therefore develops a new order fusion methodology, which is an end-to-end and parameter-efficient probabilistic forecasting model that learns a full interaction-aware representation of the buy-sell dynamics. Furthermore, as quantile crossing is often a problem in probabilistic forecasting, this approach hierarchically estimates the quantiles with non-crossing constraints. Extensive experiments on the market price indices across high-liquidity (German) and low-liquidity (Austrian) markets demonstrate consistent improvements over conventional baselines, and ablation studies highlight the contributions of the main modeling components. The methodology is available at: https://runyao-yu.github.io/OrderFusion/. 2025-02-05T15:37:21Z 10 pages, 3 figures, 5 tables Runyao Yu Yuchen Tao Fabian Leimgruber Tara Esterl Jochen Stiasny Derek W. Bunn Qingsong Wen Hongye Guo Jochen L. Cremer http://arxiv.org/abs/2601.07131v1 The Limits of Complexity: Why Feature Engineering Beats Deep Learning in Investor Flow Prediction 2026-01-12T01:46:37Z The application of machine learning to financial prediction has accelerated dramatically, yet the conditions under which complex models outperform simple alternatives remain poorly understood. This paper investigates whether advanced signal processing and deep learning techniques can extract predictive value from investor order flows beyond what simple feature engineering achieves. Using a comprehensive dataset of 2.79 million observations spanning 2,439 Korean equities from 2020--2024, we apply three methodologies: \textit{Independent Component Analysis} (ICA) to recover latent market drivers, \textit{Wavelet Coherence} analysis to characterize multi-scale correlation structure, and \textit{Long Short-Term Memory} (LSTM) networks with attention mechanisms for non-linear prediction. Our results reveal a striking finding: a parsimonious linear model using market capitalization-normalized flows (``Matched Filter'' preprocessing) achieves a Sharpe ratio of 1.30 and cumulative return of 272.6\%, while the full ICA-Wavelet-LSTM pipeline generates a Sharpe ratio of only 0.07 with a cumulative return of $-5.1\%$. The raw LSTM model collapsed to predicting the unconditional mean, achieving a hit rate of 47.5\% -- worse than random. We conclude that in low signal-to-noise financial environments, domain-specific feature engineering yields substantially higher marginal returns than algorithmic complexity. These findings establish important boundary conditions for the application of deep learning to financial prediction. 2026-01-12T01:46:37Z Sungwoo Kang http://arxiv.org/abs/2311.15333v4 Asymptotic Error Analysis of Multilevel Stochastic Approximations for the Value-at-Risk and Expected Shortfall 2026-01-11T17:32:01Z Crépey, Frikha, and Louzi (2023) introduced a nested stochastic approximation algorithm and its multilevel acceleration to compute the value-at-risk and expected shortfall of a random financial loss. We hereby establish central limit theorems for the renormalized estimation errors associated with both algorithms as well as their averaged versions. Our findings are substantiated through a numerical example. 2023-11-26T15:39:22Z 56 pages, 1 figure, 4 tables Stéphane Crépey Noufel Frikha Azar Louzi Gilles Pagès 10.1214/24-EJP1246 http://arxiv.org/abs/2510.27277v2 Black-Scholes Model, comparison between Analytical Solution and Numerical Analysis 2026-01-10T14:28:43Z The main purpose of this article is to give a general overview and understanding of the first widely used option-pricing model, the Black-Scholes model. The history and context are presented, with the usefulness and implications in the economics world. A brief review of fundamental calculus concepts is introduced to derive and solve the model. The equation is then resolved using both an analytical (variable separation) and a numerical method (finite differences). Conclusions are drawn in order to understand how Black-Scholes is employed nowadays. At the end a handy appendix (A) is written with some economics notions to ease the reader's comprehension of the paper; furthermore a second appendix (B) is given with some code scripts, to allow the reader to put in practice some concepts. 2025-10-31T08:38:07Z Francesco Romaggi