https://arxiv.org/api/Q465AwznDPUa5WIY+uJHDV2Rn+I 2026-03-16T11:11:03Z 3117 45 15 http://arxiv.org/abs/2510.12685v2 Orderbook Feature Learning and Asymmetric Generalization in Intraday Electricity Markets 2026-02-15T18:55:32Z Accurate probabilistic forecasting of intraday electricity prices is critical for market participants to inform trading decisions. Existing studies rely on specific domain features, such as Volume-Weighted Average Price (VWAP) and the last price. However, the rich information in the orderbook remains underexplored. Furthermore, these approaches are often developed within a single country and product type, making it unclear whether the approaches are generalizable. In this paper, we extract 384 features from the orderbook and identify a set of powerful features via feature selection. Based on selected features, we present a comprehensive benchmark using classical statistical models, tree-based ensembles, and deep learning models across two countries (Germany and Austria) and two product types (60-min and 15-min). We further perform a systematic generalization study across countries and product types, from which we reveal an asymmetric generalization phenomenon: models trained on more liquid markets or products transfer well to less liquid ones, whereas the reverse transfer leads to substantial performance degradation. 2025-10-14T16:21:50Z Accepted to PSCC 2026. 9 pages, 2 figures, 5 tables Runyao Yu Ruochen Wu Yongsheng Han Jochen L. Cremer http://arxiv.org/abs/2602.14233v1 Evaluating LLMs in Finance Requires Explicit Bias Consideration 2026-02-15T17:02:01Z Large Language Models (LLMs) are increasingly integrated into financial workflows, but evaluation practice has not kept up. Finance-specific biases can inflate performance, contaminate backtests, and make reported results useless for any deployment claim. We identify five recurring biases in financial LLM applications. They include look-ahead bias, survivorship bias, narrative bias, objective bias, and cost bias. These biases break financial tasks in distinct ways and they often compound to create an illusion of validity. We reviewed 164 papers from 2023 to 2025 and found that no single bias is discussed in more than 28 percent of studies. This position paper argues that bias in financial LLM systems requires explicit attention and that structural validity should be enforced before any result is used to support a deployment claim. We propose a Structural Validity Framework and an evaluation checklist with minimal requirements for bias diagnosis and future system design. The material is available at https://github.com/Eleanorkong/Awesome-Financial-LLM-Bias-Mitigation. 2026-02-15T17:02:01Z Yaxuan Kong Hoyoung Lee Yoontae Hwang Alejandro Lopez-Lira Bradford Levy Dhagash Mehta Qingsong Wen Chanyeol Choi Yongjae Lee Stefan Zohren http://arxiv.org/abs/2602.14138v1 Factor Engine: A Python Library for Systematic Financial Factor Computation and Analysis 2026-02-15T13:23:08Z Factor Engine is a high-performance, open-source Python library designed for the systematic computation and analysis of financial factors. Built around a modular and extensible API that leverages Python decorators, Factor Engine enables users to define custom factors with ease and integrates seamlessly with the modern data science ecosystem. To assess its practical effectiveness, we compare the mispricing factors computed by Factor Engine to those generated using a reference Stata implementation, finding that both approaches yield highly similar results and comparable performance in backtesting analyses. Furthermore, we experimentally apply these factors within machine learning workflows for trading strategy development, illustrating their practical utility and potential for quantitative finance research. 2026-02-15T13:23:08Z Ata Keskin http://arxiv.org/abs/2505.22121v3 Multi-period Mean-Buffered Probability of Exceedance in Defined Contribution Portfolio Optimization 2026-02-15T05:41:22Z We investigate multi-period mean-risk portfolio optimization for long-horizon Defined Contribution plans, focusing on buffered Probability of Exceedance (bPoE), a more intuitive, dollar-based alternative to Conditional Value-at-Risk (CVaR). We formulate both pre-commitment and time-consistent Mean-bPoE and Mean-CVaR portfolio optimization problems under realistic investment constraints (e.g., no leverage, no short selling) and jump-diffusion dynamics. These formulations are naturally framed as bilevel optimization problems, with an outer search over the shortfall threshold and an inner optimization over rebalancing decisions. We establish an equivalence between the pre-commitment formulations through a one-to-one correspondence of their scalarization optimal sets, while showing that no such equivalence holds in the time-consistent setting. We develop provably convergent numerical schemes for the value functions associated with both pre-commitment and time-consistent formulations of these mean-risk control problems. Using nearly a century of market data, we find that time-consistent Mean-bPoE strategies closely resemble their pre-commitment counterparts. In particular, they maintain alignment with investors' preferences for a minimum acceptable terminal wealth level-unlike time-consistent Mean-CVaR, which often leads to counterintuitive control behavior. We further show that bPoE, as a strictly tail-oriented measure, prioritizes guarding against catastrophic shortfalls while allowing meaningful upside exposure, making it especially appealing for long-horizon wealth security. These findings highlight bPoE's practical advantages for Defined Contribution investment planning. 2025-05-28T08:47:54Z 44 pages, 10 figures Duy-Minh Dang Chang Chen http://arxiv.org/abs/2602.03874v2 ASRI: An Aggregated Systemic Risk Index for Cryptocurrency Markets 2026-02-14T18:42:42Z We introduce the Aggregated Systemic Risk Index (ASRI), comprising four weighted sub-indices: Stablecoin Concentration Risk (30%), DeFi Liquidity Risk (25%), Contagion Risk (25%), and Regulatory Opacity Risk (20%). Using data from DeFi Llama, Federal Reserve FRED, and on-chain analytics, we validate against four historical crises (Terra/Luna, Celsius/3AC, FTX, SVB). Event study analysis detects significant abnormal signals for all four (t-statistics 5.47-32.64, p < 0.01), with threshold-based detection achieving 30-day average lead time for three of four events. Walk-forward validation confirms 4/4 out-of-sample detection (18-day average lead), ruling out look-ahead bias. A three-regime HMM identifies distinct risk states with >97% persistence; structural stability tests pass (Chow p = 0.993). Benchmarking against Diebold-Yilmaz connectedness shows equivalent detection (75%) with higher precision (33.5% vs. 22.4%). Out-of-sample specificity testing on 2024-2025 data confirms zero false positives, correctly classifying the $1.5B Bybit hack as non-systemic. ASRI captures DeFi-specific vulnerabilities -- composability risk, flash loan exposure, and tokenized RWA linkages -- that SRISK and CoVaR cannot accommodate. 2026-02-01T08:42:23Z 83 pages, 7 figures, 30 tables. JEL: G01, G15, G23. Live dashboard: https://asri.dissensus.ai. Code: https://github.com/studiofarzulla/asri Murad Farzulla Andrew Maksakov http://arxiv.org/abs/2602.07018v2 The Extremity Premium: Sentiment Regimes and Adverse Selection in Cryptocurrency Markets 2026-02-14T07:04:31Z Using the Crypto Fear & Greed Index and Bitcoin daily data, we document that sentiment extremity predicts excess uncertainty beyond realized volatility. Extreme fear and extreme greed regimes exhibit significantly higher spreads than neutral periods -- a phenomenon we term the "extremity premium." Extended validation on the full Fear & Greed history (February 2018--January 2026, N = 2,896) confirms the finding: within-volatility-quintile comparisons show a significant premium (p < 0.001, Cohen's d = 0.21), Granger causality from uncertainty to spreads is strong (F = 211), and placebo tests reject the null (p < 0.0001). The effect replicates on Ethereum and across 6 of 7 market cycles. However, the premium is sensitive to functional form: comprehensive regression controls absorb regime effects, while nonparametric stratification preserves them. We interpret this as evidence that sentiment extremity captures volatility-regime interactions not fully represented by parametric controls -- consistent with, but not conclusively separable from, the F&G Index's embedded volatility component. An agent-based model reproduces the pattern qualitatively. The results suggest that intensity, not direction, drives uncertainty-linked liquidity withdrawal in cryptocurrency markets, though identification of "pure" sentiment effects from volatility remains an open challenge. 2026-02-01T09:31:44Z 49 pages, 6 figures, 15+ tables. JEL: C63, G12, G14. Code: https://github.com/studiofarzulla/sentiment-microstructure-abm Murad Farzulla http://arxiv.org/abs/2602.07046v2 Same Returns, Different Risks: How Cryptocurrency Markets Process Infrastructure vs Regulatory Shocks 2026-02-14T06:46:52Z We investigate whether cryptocurrency markets differentiate between infrastructure failures and regulatory enforcement at the return level, complementing a companion conditional variance analysis that finds 5.7 times larger volatility impacts from infrastructure events (p = 0.0008). Using event-level block bootstrap inference on 31 events across Bitcoin, Ethereum, Solana, and Cardano (2019-2025), we find no statistically significant difference in cumulative abnormal returns between infrastructure failures (-7.6%) and regulatory enforcement (-11.1%): the difference of +3.6 pp has p = 0.81 with 95% CI [-25.3%, +30.9%]. This null acquires substantive meaning alongside the companion's highly significant variance result: the same events that produce indistinguishable return responses generate dramatically different volatility signatures. Markets differentiate shock types through the risk channel -- the second moment -- rather than expected returns. The block bootstrap methodology, which resamples entire events to preserve cross-sectional correlation, reveals that prior parametric approaches systematically understate uncertainty by inflating degrees of freedom. Results are robust across eight specifications including permutation tests, leave-one-out analysis, and the Ibragimov-Mueller few-cluster test. 2026-02-04T09:59:24Z 24 pages, 7 tables. JEL: C22, C58, G12, G14. Code at https://github.com/studiofarzulla/sentiment-without-structure Murad Farzulla http://arxiv.org/abs/2601.20336v4 Do Whitepaper Claims Predict Market Behavior? Evidence from Cryptocurrency Factor Analysis 2026-02-14T06:33:00Z This study investigates whether cryptocurrency whitepaper narratives align with empirically observed market factor structure. We construct a pipeline combining zero-shot NLP classification of 38 whitepapers across 10 semantic categories with CP tensor decomposition of hourly market data (49 assets, 17,543 timestamps). Using Procrustes rotation and Tucker's congruence coefficient (phi), we find weak alignment between claims and market statistics (phi = 0.246, p = 0.339) and between claims and latent factors (phi = 0.058, p = 0.751). A methodological validation comparison (statistics versus factors, both derived from market data) achieves significance (p < 0.001), confirming the pipeline detects real structure. The null result indicates whitepaper narratives do not meaningfully predict market factor structure, with implications for narrative economics and investor decision-making. Entity-level analysis reveals specialized tokens (XMR, CRV, YFI) show stronger narrative-market correspondence than broad infrastructure tokens. 2026-01-28T07:50:40Z 38 pages, 9 figures, 14 tables. JEL: G14, G12, C38, C45. Code available at https://github.com/studiofarzulla/tensor-defi Murad Farzulla http://arxiv.org/abs/2312.11797v3 Data-Driven Merton's Strategies via Policy Randomization 2026-02-14T01:38:24Z We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. The agent under consideration is a price taker who has access only to the stock and factor value processes and the instantaneous volatility. We propose an auxiliary problem in which the agent can invoke policy randomization according to a specific class of Gaussian distributions, and prove that the mean of its optimal Gaussian policy solves the original Merton problem. With randomized policies, we are in the realm of continuous-time reinforcement learning (RL) recently developed in Wang et al. (2020) and Jia and Zhou (2022a, 2022b, 2023), enabling us to solve the auxiliary problem in a data-driven way without having to estimate the model primitives. Specifically, we establish a policy improvement theorem based on which we design both online and offline actor-critic RL algorithms for learning Merton's strategies. A key insight from this study is that RL in general and policy randomization in particular are useful beyond the purpose for exploration -- they can be employed as a technical tool to solve a problem that cannot be otherwise solved by mere deterministic policies. At last, we carry out both simulation and empirical studies in a stochastic volatility environment to demonstrate the decisive outperformance of the devised RL algorithms in comparison to the conventional model-based, plug-in method. 2023-12-19T02:14:13Z 45 pages, 4 figures, 2 tables Min Dai Yuchao Dong Yanwei Jia Xun Yu Zhou http://arxiv.org/abs/2408.11773v2 Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning 2026-02-13T15:27:13Z The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases. 2024-08-21T16:54:53Z Fabrizio Lillo Andrea Macrì http://arxiv.org/abs/2602.12770v1 Efficient Monte Carlo Valuation of Corporate Bonds in Financial Networks 2026-02-13T09:55:11Z Valuing corporate bonds in systemic economies is challenging due to intricate webs of inter-institutional exposures. When a bank defaults, cascading losses propagate through the network, with payments determined by a system of fixed-point equations lacking closed-form solutions. Standard Monte Carlo methods cannot capture rare yet critical default events, while existing rare-event simulation techniques fail to account for higher-order network effects and scale poorly with network size. To overcome these challenges, we propose a novel approach -- Bi-Level Importance Sampling with Splitting -- and characterize individual bank defaults by decoupling them from the network's complex fixed-point dynamics. This separation enables a two-stage estimation process that directly generates samples from the banks' default events. We demonstrate theoretically that the method is both scalable and asymptotically optimal, and validate its effectiveness through numerical studies on empirically observed networks. 2026-02-13T09:55:11Z Dohyun Ahn Agostino Capponi http://arxiv.org/abs/2602.12030v1 Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning 2026-02-12T15:00:28Z In finance, sequential decision problems are often faced, for which reinforcement learning (RL) emerges as a promising tool for optimisation without the need of analytical tractability. However, the objective of classical RL is the expected cumulated reward, while financial applications typically require a trade-off between return and risk. In this work, we focus on settings where one cares about the time split of the total return, ruling out most risk-aware generalisations of RL which optimise a risk measure defined on the latter. We notice that a preference for homogeneous splits, which we found satisfactory for hedging, can be unfit for other problems, and therefore propose a new risk metric which still penalises uncertainty of the single rewards, but allows for an arbitrary planning of their target levels. We study the properties of the resulting objective and the generalisation of learning algorithms to optimise it. Finally, we show numerical results on toy examples. 2026-02-12T15:00:28Z 18 pages, 6 figures Federico Cacciamani Roberto Daluiso Marco Pinciroli Michele Trapletti Edoardo Vittori http://arxiv.org/abs/2602.10071v1 Deep Learning for Electricity Price Forecasting: A Review of Day-Ahead, Intraday, and Balancing Electricity Markets 2026-02-10T18:36:36Z Electricity price forecasting (EPF) plays a critical role in power system operation and market decision making. While existing review studies have provided valuable insights into forecasting horizons, market mechanisms, and evaluation practices, the rapid adoption of deep learning has introduced increasingly diverse model architectures, output structures, and training objectives that remain insufficiently analyzed in depth. This paper presents a structured review of deep learning methods for EPF in day-ahead, intraday, and balancing markets. Specifically, We introduce a unified taxonomy that decomposes deep learning models into backbone, head, and loss components, providing a consistent evaluation perspective across studies. Using this framework, we analyze recent trends in deep learning components across markets. Our study highlights the shift toward probabilistic, microstructure-centric, and market-aware designs. We further identify key gaps in the literature, including limited attention to intraday and balancing markets and the need for market-specific modeling strategies, thereby helping to consolidate and advance existing review studies. 2026-02-10T18:36:36Z 9 pages, 2 figures, 2 tables Runyao Yu Derek W. Bunn Julia Lin Jochen Stiasny Fabian Leimgruber Tara Esterl Yuchen Tao Lianlian Qi Yujie Chen Wentao Wang Jochen L. Cremer http://arxiv.org/abs/2602.09950v1 How can the dual martingale help solving the primal optimal stopping problem? 2026-02-10T16:35:28Z Motivated by recent results on the dual formulation of optimal stopping problems, we investigate in this short paper how the knowledge of an approximating dual martingale can improve the efficiency of primal methods. In particular, we show on numerical examples that accurate approximations of a dual martingale efficiently reduce the variance for the primal optimal stopping problem. 2026-02-10T16:35:28Z Aurélien Alfonsi Ahmed Kebaier Jérôme Lelong http://arxiv.org/abs/2602.08182v1 Nansde-net: A neural sde framework for generating time series with memory 2026-02-09T00:53:28Z Modeling time series with long- or short-memory characteristics is a fundamental challenge in many scientific and engineering domains. While fractional Brownian motion has been widely used as a noise source to capture such memory effects, its incompatibility with Itô calculus limits its applicability in neural stochastic differential equation~(SDE) frameworks. In this paper, we propose a novel class of noise, termed Neural Network-kernel ARMA-type noise~(NA-noise), which is an Itô-process-based alternative capable of capturing both long- and short-memory behaviors. The kernel function defining the noise structure is parameterized via neural networks and decomposed into a product form to preserve the Markov property. Based on this noise process, we develop NANSDE-Net, a generative model that extends Neural SDEs by incorporating NA-noise. We prove the theoretical existence and uniqueness of the solution under mild conditions and derive an efficient backpropagation scheme for training. Empirical results on both synthetic and real-world datasets demonstrate that NANSDE-Net matches or outperforms existing models, including fractional SDE-Net, in reproducing long- and short-memory features of the data, while maintaining computational tractability within the Itô calculus framework. 2026-02-09T00:53:28Z PAKDD2026 Accepted Hiromu Ozai Kei Nakagawa