https://arxiv.org/api/Q465AwznDPUa5WIY+uJHDV2Rn+I2026-03-16T11:11:03Z31174515http://arxiv.org/abs/2510.12685v2Orderbook Feature Learning and Asymmetric Generalization in Intraday Electricity Markets2026-02-15T18:55:32ZAccurate probabilistic forecasting of intraday electricity prices is critical for market participants to inform trading decisions. Existing studies rely on specific domain features, such as Volume-Weighted Average Price (VWAP) and the last price. However, the rich information in the orderbook remains underexplored. Furthermore, these approaches are often developed within a single country and product type, making it unclear whether the approaches are generalizable. In this paper, we extract 384 features from the orderbook and identify a set of powerful features via feature selection. Based on selected features, we present a comprehensive benchmark using classical statistical models, tree-based ensembles, and deep learning models across two countries (Germany and Austria) and two product types (60-min and 15-min). We further perform a systematic generalization study across countries and product types, from which we reveal an asymmetric generalization phenomenon: models trained on more liquid markets or products transfer well to less liquid ones, whereas the reverse transfer leads to substantial performance degradation.2025-10-14T16:21:50ZAccepted to PSCC 2026. 9 pages, 2 figures, 5 tablesRunyao YuRuochen WuYongsheng HanJochen L. Cremerhttp://arxiv.org/abs/2602.14233v1Evaluating LLMs in Finance Requires Explicit Bias Consideration2026-02-15T17:02:01ZLarge Language Models (LLMs) are increasingly integrated into financial workflows, but evaluation practice has not kept up. Finance-specific biases can inflate performance, contaminate backtests, and make reported results useless for any deployment claim. We identify five recurring biases in financial LLM applications. They include look-ahead bias, survivorship bias, narrative bias, objective bias, and cost bias. These biases break financial tasks in distinct ways and they often compound to create an illusion of validity. We reviewed 164 papers from 2023 to 2025 and found that no single bias is discussed in more than 28 percent of studies. This position paper argues that bias in financial LLM systems requires explicit attention and that structural validity should be enforced before any result is used to support a deployment claim. We propose a Structural Validity Framework and an evaluation checklist with minimal requirements for bias diagnosis and future system design. The material is available at https://github.com/Eleanorkong/Awesome-Financial-LLM-Bias-Mitigation.2026-02-15T17:02:01ZYaxuan KongHoyoung LeeYoontae HwangAlejandro Lopez-LiraBradford LevyDhagash MehtaQingsong WenChanyeol ChoiYongjae LeeStefan Zohrenhttp://arxiv.org/abs/2602.14138v1Factor Engine: A Python Library for Systematic Financial Factor Computation and Analysis2026-02-15T13:23:08ZFactor Engine is a high-performance, open-source Python library designed for the systematic computation and analysis of financial factors. Built around a modular and extensible API that leverages Python decorators, Factor Engine enables users to define custom factors with ease and integrates seamlessly with the modern data science ecosystem. To assess its practical effectiveness, we compare the mispricing factors computed by Factor Engine to those generated using a reference Stata implementation, finding that both approaches yield highly similar results and comparable performance in backtesting analyses. Furthermore, we experimentally apply these factors within machine learning workflows for trading strategy development, illustrating their practical utility and potential for quantitative finance research.2026-02-15T13:23:08ZAta Keskinhttp://arxiv.org/abs/2505.22121v3Multi-period Mean-Buffered Probability of Exceedance in Defined Contribution Portfolio Optimization2026-02-15T05:41:22ZWe investigate multi-period mean-risk portfolio optimization for long-horizon Defined Contribution plans, focusing on buffered Probability of Exceedance (bPoE), a more intuitive, dollar-based alternative to Conditional Value-at-Risk (CVaR). We formulate both pre-commitment and time-consistent Mean-bPoE and Mean-CVaR portfolio optimization problems under realistic investment constraints (e.g., no leverage, no short selling) and jump-diffusion dynamics. These formulations are naturally framed as bilevel optimization problems, with an outer search over the shortfall threshold and an inner optimization over rebalancing decisions. We establish an equivalence between the pre-commitment formulations through a one-to-one correspondence of their scalarization optimal sets, while showing that no such equivalence holds in the time-consistent setting. We develop provably convergent numerical schemes for the value functions associated with both pre-commitment and time-consistent formulations of these mean-risk control problems.
Using nearly a century of market data, we find that time-consistent Mean-bPoE strategies closely resemble their pre-commitment counterparts. In particular, they maintain alignment with investors' preferences for a minimum acceptable terminal wealth level-unlike time-consistent Mean-CVaR, which often leads to counterintuitive control behavior. We further show that bPoE, as a strictly tail-oriented measure, prioritizes guarding against catastrophic shortfalls while allowing meaningful upside exposure, making it especially appealing for long-horizon wealth security. These findings highlight bPoE's practical advantages for Defined Contribution investment planning.2025-05-28T08:47:54Z44 pages, 10 figuresDuy-Minh DangChang Chenhttp://arxiv.org/abs/2602.03874v2ASRI: An Aggregated Systemic Risk Index for Cryptocurrency Markets2026-02-14T18:42:42ZWe introduce the Aggregated Systemic Risk Index (ASRI), comprising four weighted sub-indices: Stablecoin Concentration Risk (30%), DeFi Liquidity Risk (25%), Contagion Risk (25%), and Regulatory Opacity Risk (20%). Using data from DeFi Llama, Federal Reserve FRED, and on-chain analytics, we validate against four historical crises (Terra/Luna, Celsius/3AC, FTX, SVB). Event study analysis detects significant abnormal signals for all four (t-statistics 5.47-32.64, p < 0.01), with threshold-based detection achieving 30-day average lead time for three of four events. Walk-forward validation confirms 4/4 out-of-sample detection (18-day average lead), ruling out look-ahead bias. A three-regime HMM identifies distinct risk states with >97% persistence; structural stability tests pass (Chow p = 0.993). Benchmarking against Diebold-Yilmaz connectedness shows equivalent detection (75%) with higher precision (33.5% vs. 22.4%). Out-of-sample specificity testing on 2024-2025 data confirms zero false positives, correctly classifying the $1.5B Bybit hack as non-systemic. ASRI captures DeFi-specific vulnerabilities -- composability risk, flash loan exposure, and tokenized RWA linkages -- that SRISK and CoVaR cannot accommodate.2026-02-01T08:42:23Z83 pages, 7 figures, 30 tables. JEL: G01, G15, G23. Live dashboard: https://asri.dissensus.ai. Code: https://github.com/studiofarzulla/asriMurad FarzullaAndrew Maksakovhttp://arxiv.org/abs/2602.07018v2The Extremity Premium: Sentiment Regimes and Adverse Selection in Cryptocurrency Markets2026-02-14T07:04:31ZUsing the Crypto Fear & Greed Index and Bitcoin daily data, we document that sentiment extremity predicts excess uncertainty beyond realized volatility. Extreme fear and extreme greed regimes exhibit significantly higher spreads than neutral periods -- a phenomenon we term the "extremity premium." Extended validation on the full Fear & Greed history (February 2018--January 2026, N = 2,896) confirms the finding: within-volatility-quintile comparisons show a significant premium (p < 0.001, Cohen's d = 0.21), Granger causality from uncertainty to spreads is strong (F = 211), and placebo tests reject the null (p < 0.0001). The effect replicates on Ethereum and across 6 of 7 market cycles. However, the premium is sensitive to functional form: comprehensive regression controls absorb regime effects, while nonparametric stratification preserves them. We interpret this as evidence that sentiment extremity captures volatility-regime interactions not fully represented by parametric controls -- consistent with, but not conclusively separable from, the F&G Index's embedded volatility component. An agent-based model reproduces the pattern qualitatively. The results suggest that intensity, not direction, drives uncertainty-linked liquidity withdrawal in cryptocurrency markets, though identification of "pure" sentiment effects from volatility remains an open challenge.2026-02-01T09:31:44Z49 pages, 6 figures, 15+ tables. JEL: C63, G12, G14. Code: https://github.com/studiofarzulla/sentiment-microstructure-abmMurad Farzullahttp://arxiv.org/abs/2602.07046v2Same Returns, Different Risks: How Cryptocurrency Markets Process Infrastructure vs Regulatory Shocks2026-02-14T06:46:52ZWe investigate whether cryptocurrency markets differentiate between infrastructure failures and regulatory enforcement at the return level, complementing a companion conditional variance analysis that finds 5.7 times larger volatility impacts from infrastructure events (p = 0.0008). Using event-level block bootstrap inference on 31 events across Bitcoin, Ethereum, Solana, and Cardano (2019-2025), we find no statistically significant difference in cumulative abnormal returns between infrastructure failures (-7.6%) and regulatory enforcement (-11.1%): the difference of +3.6 pp has p = 0.81 with 95% CI [-25.3%, +30.9%]. This null acquires substantive meaning alongside the companion's highly significant variance result: the same events that produce indistinguishable return responses generate dramatically different volatility signatures. Markets differentiate shock types through the risk channel -- the second moment -- rather than expected returns. The block bootstrap methodology, which resamples entire events to preserve cross-sectional correlation, reveals that prior parametric approaches systematically understate uncertainty by inflating degrees of freedom. Results are robust across eight specifications including permutation tests, leave-one-out analysis, and the Ibragimov-Mueller few-cluster test.2026-02-04T09:59:24Z24 pages, 7 tables. JEL: C22, C58, G12, G14. Code at https://github.com/studiofarzulla/sentiment-without-structureMurad Farzullahttp://arxiv.org/abs/2601.20336v4Do Whitepaper Claims Predict Market Behavior? Evidence from Cryptocurrency Factor Analysis2026-02-14T06:33:00ZThis study investigates whether cryptocurrency whitepaper narratives align with empirically observed market factor structure. We construct a pipeline combining zero-shot NLP classification of 38 whitepapers across 10 semantic categories with CP tensor decomposition of hourly market data (49 assets, 17,543 timestamps). Using Procrustes rotation and Tucker's congruence coefficient (phi), we find weak alignment between claims and market statistics (phi = 0.246, p = 0.339) and between claims and latent factors (phi = 0.058, p = 0.751). A methodological validation comparison (statistics versus factors, both derived from market data) achieves significance (p < 0.001), confirming the pipeline detects real structure. The null result indicates whitepaper narratives do not meaningfully predict market factor structure, with implications for narrative economics and investor decision-making. Entity-level analysis reveals specialized tokens (XMR, CRV, YFI) show stronger narrative-market correspondence than broad infrastructure tokens.2026-01-28T07:50:40Z38 pages, 9 figures, 14 tables. JEL: G14, G12, C38, C45. Code available at https://github.com/studiofarzulla/tensor-defiMurad Farzullahttp://arxiv.org/abs/2312.11797v3Data-Driven Merton's Strategies via Policy Randomization2026-02-14T01:38:24ZWe study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. The agent under consideration is a price taker who has access only to the stock and factor value processes and the instantaneous volatility. We propose an auxiliary problem in which the agent can invoke policy randomization according to a specific class of Gaussian distributions, and prove that the mean of its optimal Gaussian policy solves the original Merton problem. With randomized policies, we are in the realm of continuous-time reinforcement learning (RL) recently developed in Wang et al. (2020) and Jia and Zhou (2022a, 2022b, 2023), enabling us to solve the auxiliary problem in a data-driven way without having to estimate the model primitives. Specifically, we establish a policy improvement theorem based on which we design both online and offline actor-critic RL algorithms for learning Merton's strategies. A key insight from this study is that RL in general and policy randomization in particular are useful beyond the purpose for exploration -- they can be employed as a technical tool to solve a problem that cannot be otherwise solved by mere deterministic policies. At last, we carry out both simulation and empirical studies in a stochastic volatility environment to demonstrate the decisive outperformance of the devised RL algorithms in comparison to the conventional model-based, plug-in method.2023-12-19T02:14:13Z45 pages, 4 figures, 2 tablesMin DaiYuchao DongYanwei JiaXun Yu Zhouhttp://arxiv.org/abs/2408.11773v2Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning2026-02-13T15:27:13ZThe use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.2024-08-21T16:54:53ZFabrizio LilloAndrea Macrìhttp://arxiv.org/abs/2602.12770v1Efficient Monte Carlo Valuation of Corporate Bonds in Financial Networks2026-02-13T09:55:11ZValuing corporate bonds in systemic economies is challenging due to intricate webs of inter-institutional exposures. When a bank defaults, cascading losses propagate through the network, with payments determined by a system of fixed-point equations lacking closed-form solutions. Standard Monte Carlo methods cannot capture rare yet critical default events, while existing rare-event simulation techniques fail to account for higher-order network effects and scale poorly with network size. To overcome these challenges, we propose a novel approach -- Bi-Level Importance Sampling with Splitting -- and characterize individual bank defaults by decoupling them from the network's complex fixed-point dynamics. This separation enables a two-stage estimation process that directly generates samples from the banks' default events. We demonstrate theoretically that the method is both scalable and asymptotically optimal, and validate its effectiveness through numerical studies on empirically observed networks.2026-02-13T09:55:11ZDohyun AhnAgostino Capponihttp://arxiv.org/abs/2602.12030v1Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning2026-02-12T15:00:28ZIn finance, sequential decision problems are often faced, for which reinforcement learning (RL) emerges as a promising tool for optimisation without the need of analytical tractability. However, the objective of classical RL is the expected cumulated reward, while financial applications typically require a trade-off between return and risk. In this work, we focus on settings where one cares about the time split of the total return, ruling out most risk-aware generalisations of RL which optimise a risk measure defined on the latter. We notice that a preference for homogeneous splits, which we found satisfactory for hedging, can be unfit for other problems, and therefore propose a new risk metric which still penalises uncertainty of the single rewards, but allows for an arbitrary planning of their target levels. We study the properties of the resulting objective and the generalisation of learning algorithms to optimise it. Finally, we show numerical results on toy examples.2026-02-12T15:00:28Z18 pages, 6 figuresFederico CacciamaniRoberto DaluisoMarco PinciroliMichele TraplettiEdoardo Vittorihttp://arxiv.org/abs/2602.10071v1Deep Learning for Electricity Price Forecasting: A Review of Day-Ahead, Intraday, and Balancing Electricity Markets2026-02-10T18:36:36ZElectricity price forecasting (EPF) plays a critical role in power system operation and market decision making. While existing review studies have provided valuable insights into forecasting horizons, market mechanisms, and evaluation practices, the rapid adoption of deep learning has introduced increasingly diverse model architectures, output structures, and training objectives that remain insufficiently analyzed in depth. This paper presents a structured review of deep learning methods for EPF in day-ahead, intraday, and balancing markets. Specifically, We introduce a unified taxonomy that decomposes deep learning models into backbone, head, and loss components, providing a consistent evaluation perspective across studies. Using this framework, we analyze recent trends in deep learning components across markets. Our study highlights the shift toward probabilistic, microstructure-centric, and market-aware designs. We further identify key gaps in the literature, including limited attention to intraday and balancing markets and the need for market-specific modeling strategies, thereby helping to consolidate and advance existing review studies.2026-02-10T18:36:36Z9 pages, 2 figures, 2 tablesRunyao YuDerek W. BunnJulia LinJochen StiasnyFabian LeimgruberTara EsterlYuchen TaoLianlian QiYujie ChenWentao WangJochen L. Cremerhttp://arxiv.org/abs/2602.09950v1How can the dual martingale help solving the primal optimal stopping problem?2026-02-10T16:35:28ZMotivated by recent results on the dual formulation of optimal stopping problems, we investigate in this short paper how the knowledge of an approximating dual martingale can improve the efficiency of primal methods. In particular, we show on numerical examples that accurate approximations of a dual martingale efficiently reduce the variance for the primal optimal stopping problem.2026-02-10T16:35:28ZAurélien AlfonsiAhmed KebaierJérôme Lelonghttp://arxiv.org/abs/2602.08182v1Nansde-net: A neural sde framework for generating time series with memory2026-02-09T00:53:28ZModeling time series with long- or short-memory characteristics is a fundamental challenge in many scientific and engineering domains. While fractional Brownian motion has been widely used as a noise source to capture such memory effects, its incompatibility with Itô calculus limits its applicability in neural stochastic differential equation~(SDE) frameworks. In this paper, we propose a novel class of noise, termed Neural Network-kernel ARMA-type noise~(NA-noise), which is an Itô-process-based alternative capable of capturing both long- and short-memory behaviors. The kernel function defining the noise structure is parameterized via neural networks and decomposed into a product form to preserve the Markov property. Based on this noise process, we develop NANSDE-Net, a generative model that extends Neural SDEs by incorporating NA-noise. We prove the theoretical existence and uniqueness of the solution under mild conditions and derive an efficient backpropagation scheme for training. Empirical results on both synthetic and real-world datasets demonstrate that NANSDE-Net matches or outperforms existing models, including fractional SDE-Net, in reproducing long- and short-memory features of the data, while maintaining computational tractability within the Itô calculus framework.2026-02-09T00:53:28ZPAKDD2026 AcceptedHiromu OzaiKei Nakagawa