https://arxiv.org/api/xtAf881GVaQ0K6R6eZC05zrhv+I 2026-06-21T07:49:19Z 3237 30 15 http://arxiv.org/abs/2606.17065v1 PIVOT: Bridging Black-Scholes Implied-Volatility and Price Objectives via Differentiable Jäckel Operator 2026-06-04T14:43:38Z

Modern option-learning systems operate in two coordinates: price space, where markets quote and no-arbitrage constraints are most naturally enforced, and implied volatility (IV) space, where volatility surfaces are smoothed, regularized, and evaluated. The bottleneck is interface, not approximation: Jäckel's seminal "Let's Be Rational" (LBR) solver already inverts the Black-Scholes price to machine precision efficiently. What is missing is a differentiable layer that preserves LBR in the forward pass and avoids backpropagating through its branch logic. Such a layer must also confront the unavoidable singularity of the inverse map in the low-vega regime, where the sensitivity 1/vega diverges as vega -> 0. We close this gap with PIVOT, the Price-Implied-Volatility Objective Translator. PIVOT keeps the LBR forward pass intact and supplies the backward pass by implicit differentiation through the smooth Black-Scholes/Black-76 price map, with an explicit gating contract: invalid domains return NaN, well-conditioned rows receive the exact 1/vega gradient, and low-vega rows are attenuated rather than silently regularized. On a single H100, a fused Triton kernel reaches 1.79e9 IV/s at machine precision (9.3e-14 max relative error vs. the reference C solver); end-to-end label generation sustains 48.9M/s on synthetic chains and 16.6M/s on SPX OptionMetrics. In a HyperIV-style one-day reproduction on SPX, PIVOT-augmented objectives Pareto-dominate the baselines, reducing held-out price MAE by up to 43.4% and the strongest three-seed gated objective improving price MAE by 38.8% and IV MAE by 21.3% jointly; cross-asset results on RUT, VIX, and NDX show directional price-MAE gains of 40.1%, 24.2%, and 16.7%, while an ungated IV-roundtrip control collapses to a degenerate near-zero surface, confirming the gate as a correctness contract rather than a tuning knob.

2026-06-04T14:43:38Z 30 pages, 17 figures, 12 tables Raeid Saqur Yannick Limmer Anastasis Kratsios Blanka Horvath Hans Buehler http://arxiv.org/abs/2606.05733v1 Zero-Copy Semantic Contagion: An In-Memory Streaming Architecture for Evolving Attention Graphs 2026-06-04T05:48:56Z

Per-ticker forecasting models dominate financial time-series work yet remain blind to cross-company propagation: a foundry disruption in Taiwan does not register in a single-asset model until Apple's own price has already moved. To address this limitation, we introduce a heterogeneous Rust-Python streaming architecture that maps cross-company attention as a continuous-time graph driven directly from text. We show that on the ingestion side, a zero-copy Rust edge parses news records in $\sim$100 ns and scans the target equity universe in $\sim$1.2 $μ$s. On the inference end, a multivariate Neural Hawkes Process featuring per-node continuous-time LSTM states and a bilinear latent projection propagates directed excitation, while an adaptive pruning rule bounds the computational cost of dynamic neighborhood updates. Combining these stages, we demonstrate an end-to-end processing latency of $\sim$13 ms per incoming news record on a single commodity CPU. Evaluated on a one-month temporal holdout of the FNSPID corpus (638 articles across 47 tickers), the system delivers a $1.70\times$ precision lift over random at the 90th-percentile next-day return threshold, and $3.36\times$ over a same-sector baseline. Crucially, removing the graph topology collapses precision to zero, confirming that the dynamic attention network is the sole driver of cross-company signal in this architecture.

2026-06-04T05:48:56Z Accepted to the 2026 ACM SIGMOD Workshop on Data Management for the Modern Financial Systems (FinDS). 10 pages, 4 figures Kabir Murjani http://arxiv.org/abs/2509.19663v2 Long-Range Dependence in Financial Markets: Empirical Evidence and Generative Modeling Challenges 2026-06-04T01:13:32Z

This study provides a comprehensive empirical investigation of long-range dependence (LRD) in financial markets and evaluates the ability of deep generative models to reproduce such temporal structures. Using daily data from three representative sectors--equity (S&P 500, DAX, Nikkei 225), commodities (Wheat, Corn, Soybeans), and energy (UNG, USO, XLE)--we examine the presence of LRD through three complementary approaches: rescaled range (R/S) analysis, detrended fluctuation analysis (DFA), and an ARFIMA--FIGARCH model with Student's $t$-distributed innovations. The empirical evidence suggests that while mean returns exhibit limited persistence, pronounced long memory is consistently observed in conditional volatility across most assets. Building on these findings, we assess whether Quant Generative Adversarial Networks (Quant GANs) can learn and reproduce these stylized temporal dependencies. Although the generated series successfully mimic heavy-tailed return distributions and certain aspects of volatility clustering, they generally fail to capture the magnitude and consistency of LRD observed in real data, particularly in volatility dynamics. These results highlight an important limitation of current deep generative architectures in modeling slow-decaying dependence structures and underscore the need for incorporating explicit long-memory mechanisms when synthetic financial data are intended for risk management or long-horizon forecasting applications.

2025-09-24T00:41:14Z 28 pages, 8 figures, 7 tables Yifan He Svetlozar Rachev http://arxiv.org/abs/2508.19006v2 Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models 2026-06-03T20:01:04Z

This study investigates the pre-trained RNN attention models with the mainstream attention mechanisms, such as additive attention, Luong's three attentions, global self-attention and sliding window sparse attention, for the empirical asset pricing research on the top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning-based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19, COVID-19 and one year post-COVID-19, for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, the global self-attention model and the sliding window sparse attention model exhibit excellent capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19 in the static transaction cost scenario. Moreover, the sliding window sparse attention model performs more stably than the global self-attention model from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.

2025-08-26T13:04:28Z 72 pages including appendix Shanyan Lai http://arxiv.org/abs/2602.14378v2 A Computational Framework for Financial Structures 2026-06-03T16:09:07Z

Financial structures transform stochastic cash-flow representations of underlying economic activities into ordered payments across multiple claims through systems of contractual allocation rules. While sophisticated computational models of such structures are widely used in practice, their allocation logic is typically embedded in proprietary implementations and deal-specific documentation, making systematic analysis, comparison, and verification across transactions difficult. This paper develops a computational framework that formalizes financial structures as structured allocation systems mapping stochastic inflows into hierarchically ordered payments through explicit and state-dependent allocation operators. The framework distinguishes between economic activities, financeable representations, admissible structures, feasible designs, and sustainable designs. It thereby specifies the inputs on which allocation mechanisms operate, the structural requirements that distinguish financial structures from generic contracts, and the feasibility restrictions under which a design may be undertaken.

2026-02-16T01:06:18Z Antonio Scala Andrea Monaco http://arxiv.org/abs/2605.05140v4 A Practical Guide to Strip Caplet Volatilities 2026-06-02T13:55:47Z

We study caplet stripping, the problem of recovering a caplet volatility term structure consistent with quoted cap volatilities. Many academic papers on the Libor market model assume caplet volatilities are readily available, whereas practitioners know they are not and extracting them is a complex task. This paper presents a practical workflow, structuring the presentation around a constructive algorithm. We start with criteria on the input data based on cap time-value monotonicity. If time values fail this check, we show how to correct the quotes using robust outlier detection based on the modified Z-score. The time-value proposition naturally leads to a direct non-bootstrap stripping approach by interpolating cap time values, which yields arbitrage-free caplet volatilities by construction. We then revisit the classic sequential bootstrap approach. We introduce compact-kernel transition interpolants (flat-linear and $C^1$ flat-smooth) that preserve bootstrap equivalence. Finally, for a richer, smoother curve, we introduce global search methods using midpoint node placement with positivity-preserving calibration. Pathological cases and detailed analyses of oscillations are provided in the appendix.

2026-05-06T17:12:56Z Fabien Le Floc'h http://arxiv.org/abs/2508.13174v2 AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining 2026-06-02T13:19:43Z

Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.

2025-08-10T11:19:24Z Accepted by KDD2026 Hongjun Ding Binqi Chen Jinsheng Huang Taian Guo Zhengyang Mao Guoyi Shao Lutong Zou Luchen Liu Ming Zhang 10.1145/3770855.3817727 http://arxiv.org/abs/2606.03457v1 Hybrid News Sentiment Engine: Real-Time Market Analysis via Adaptive Ensemble Learning on News-Price Pairs 2026-06-02T10:34:22Z

We present a hybrid news sentiment engine that continuously learns market sentiment from paired news headlines and concurrent asset-price snapshots without requiring any neural network training or GPU compute. The system uses a three-way ensemble combining (1) a financial-domain lexicon (FinBERT-style keyword scoring), (2) an adaptive statistical TF-IDF cluster learner that organizes headlines into semantic neighborhoods and tracks their average realized price reactions, and (3) an auto-calibrating weighting mechanism that adjusts ensemble contributions based on each signal's historical correlation with actual price movements. The engine runs on a 3-hour polling cycle from the Tradeflags NewsFeed API, which provides 22 price-snapshot fields per news item spanning equity indices (ES, NQ, SPY, DJIA, NDX, IWM), commodities (CL), and cryptocurrencies (BTC, ETH). All processing occurs at sub-second latency on a CPU-only server at effectively zero marginal cost per analytic cycle. We compare our approach against established methods -- FinBERT, GPT-based scoring, VADER, and commercial sentiment APIs -- across dimensions of cost, latency, accuracy, and adaptability. Our statistical cluster learner, which adapts to changing market regimes without retraining, represents a novel contribution not found in existing sentiment systems.

2026-06-02T10:34:22Z 12 pages Andreas Aigner http://arxiv.org/abs/2606.03184v1 FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance 2026-06-02T05:41:50Z

Financial forecasting is difficult due to low signal-to-noise ratios, latent factors, heavy tails, regime shifts, and jumps. Real-world benchmarks offer limited failure attribution: researchers can observe underperformance, but often cannot isolate why because mechanisms are unobservable and entangled. Real financial data reveal only one realized path, making it difficult to assess tail-risk calibration or data efficiency. We introduce FinStressTS, a mechanism-aware synthetic benchmark that links model behavior to controlled structural causes. FinStressTS comprises 30 diagnostic environments around six mechanism families: volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, and zero-inflated processes. We evaluate two tasks: point forecasting, using NMAE across five settings, and probabilistic forecasting, using CRPS under known data-generating mechanisms. We benchmark 15 models, from classical methods (HAR, VAR) to Transformer forecasters (PatchTST, iTransformer) and deep probabilistic architectures (DeepAR, TSFlow), and use learning curves to measure sample efficiency. Our evaluation reveals three insights. First, performance is mechanism-dependent: autoregressive and linear models are highly competitive, and often outperform Transformer-based models, in several volatility-, tail-, and jump-driven environments. Second, distributional alignment matters: parametric probabilistic models such as DeepAR calibrate well in stationary settings, while flexible models can help when distributions become multimodal or sparse. Third, neural models often require more data to match simple baselines, with larger gains mainly when learning latent regimes or complex distributions. FinStressTS provides an open framework for diagnosing failure modes and advancing risk-aware forecasting.

2026-06-02T05:41:50Z KDD 2026 (Oral) Jiaze Sun Kelvin J. L. Koa Ruiyang Ni Yize Liu Haonan Chen Ke-Wei Huang 10.1145/3770855.3817578 http://arxiv.org/abs/2409.12721v3 Market Simulation under Adverse Selection 2026-06-02T01:10:06Z

In this paper, we study the effects of fill probabilities and adverse fills on the trading strategy simulation process. We specifically focus on a stochastic optimal control market-making problem and test the strategy on ES (E-mini S\&P 500), NQ (E-mini Nasdaq 100), CL (Crude Oil) and ZN (10-Year Treasury Note), which are some of the most liquid futures contracts listed on the CME (Chicago Mercantile Exchange). We provide empirical evidence that shows how fill probabilities and adverse fills can significantly affect performance and propose a more prudent simulation framework to deal with this. Many previous works aim to measure different types of adverse selection in the limit order book (LOB), however, they often simulate price processes and market orders independently. This has the ability to largely inflate the performance of a short-term style trading strategy. Our studies show that using more realistic fill probabilities and tracking adverse fills in the strategy simulation process more accurately shows how these types of trading strategies would perform in reality.

2024-09-19T12:42:03Z Luca Lalor Anatoliy Swishchuk http://arxiv.org/abs/2605.12151v2 RED-2400: A Public Benchmark of Algorithmically-Rejected Trading Events with Outcome Labels 2026-06-01T23:41:55Z

RED-2400 is a public benchmark of 6,660 algorithmically-rejected trading events from a live Solana decentralised-exchange filter stack, observed continuously over 22 calendar days (2026-04-10T21:10Z through 2026-05-02T21:48Z, UTC). Each rejection event is linked to its post-rejection price-and-liquidity trajectory. The deposit contains 169,123 forward-outcome observations and 1,837 graveyard-tracker lifecycle snapshots, covering 1,076 distinct mints in the rejection registry and 1,075 in the forward-observation file. Outcome labels follow the five-tier classification rule introduced by a related methodology paper [Kamat 2026c]. The deposit includes a lifecycle-tracker file that permits external validation of any subset of those labels against observed token-lifecycle ground truth. Filter labels are anonymised to filter_1 through filter_8; source-collector identifiers to source_a and source_b. Liquidity and 24-hour volume are quantised to the nearest power of two, preserving heavy-tailed shape while preventing operational-threshold inference. This is the first window of a planned series; subsequent windows will extend the time horizon and enable regime-stratified analysis. "RED-2400" is a brand name, not a count; current cohort sizes are listed below and do not equal 2,400.

2026-05-12T14:06:51Z v3: prose cleanup (abstract tightened, long sentences split, internal-branding removed). No change to data, methodology, results, tables, figures, DOI links, or companion references. PDF re-rendered from corrected source. Companion Zenodo deposit unchanged Arati U. Kamat http://arxiv.org/abs/2606.02657v1 Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift 2026-06-01T06:04:42Z

The standard generalization bounds assume that the training and deployment distributions are the same, or are static, and don't consider regime switching environments where the ratio of calm vs crisis states is different. This paper proposes a framework that generalizes regime-aware models by quantifying the extra risk due to regime composition mismatch, when distribution shifts are Markov-switching. We obtain an exact decomposition, separating regime mismatch from regime sensitivity; we extend the bound to beta-mixing data using the effective sample size corrected for the spectral gap; and we show a minimax lower bound for synthetic data and on 25 years of global equity indices. The proposed penalty is an ex post realized generalization gap, whereas the training-only estimator does not show significant correlation: the feature geometry of crises can be detected, but not the temporal arrival. Thus, the framework is not a forecast machine. Forecasting the composition of the future regime is an open question in the rare cases of regime change.

2026-06-01T06:04:42Z 23 pages, 4 tables, 3 Figures Prince Poudel http://arxiv.org/abs/2212.07944v4 Variable Clustering via Distributionally Robust Nodewise Regression 2026-06-01T03:22:24Z

We study a multi-factor block model for variable clustering and connect it to regularized subspace clustering through a distributionally robust version of nodewise regression. To solve the latter problem, we derive a convex relaxation, provide a data-driven approach for selecting the size of the robust region, and develop an ADMM algorithm for efficient implementation. We validate our method in extensive numerical studies and demonstrate its superior performance.

2022-12-15T16:23:25Z ICML 2026 Kaizheng Wang Xiao Xu Xun Yu Zhou http://arxiv.org/abs/2606.01356v1 A Formally Verified Library of Mathematical Finance in Lean 4 2026-05-31T17:27:29Z

We describe a library of mathematical finance built in the Lean 4 proof assistant, on top of Mathlib and the BrownianMotion package. It is broad: more than two hundred sorry-free theorems across eleven areas, from the measure-theoretic foundations of continuous-time stochastic calculus through derivative pricing to applied risk, portfolio, and fixed-income theory, and, to our knowledge, the most comprehensive machine-checked development of mathematical finance to date. Breadth is the setting, not the point. Two things make it more than a catalogue. It reaches into the continuous theory far enough to construct the L2 Itô integral as a bounded linear isometry and to derive, rather than assume, the risk-neutral pricing measure. And it audits its own faithfulness: every result is classified by how its Lean statement relates to the mathematics it claims, and a build-enforced gate pins the axioms each proof actually uses, so a reader can see precisely what has been proved and what has only been proved under added hypotheses. We close with a candid finding: a formal base over classical financial mathematics yields certified unification of known results rather than new financial theory. The contribution is therefore methodological and infrastructural, reusable verified foundations for mathematical finance, together with the faithfulness audit.

2026-05-31T17:27:29Z 7 pages. Lean 4 artifact (Apache-2.0): https://github.com/raphaelrrcoelho/formal-mathfin ; archived at doi:10.5281/zenodo.20477782 Raphael Coelho http://arxiv.org/abs/2606.01131v1 Tokenized but Illiquid? Evidence from Real-World Asset Markets 2026-05-31T10:05:25Z

Real-world asset tokenization is often presented as a mechanism for improving the liquidity of traditionally illiquid assets. However, on-chain representation and secondary-market liquidity are distinct outcomes. This paper examines whether tokenized real-world assets exhibit meaningful observed liquidity and identifies the token characteristics associated with higher market activity. Using token-level data from RWA.xyz and supplemental contract-level observations from Etherscan, the study constructs an Ethereum-based monthly panel of non-stablecoin real-world assets across three prominent categories: U.S. Treasury-backed tokens, gold-backed commodity tokens, and private-credit-related tokens. Liquidity is measured using turnover, active addresses, and an active-month indicator. The empirical design combines descriptive statistics, non-parametric group tests, and exploratory panel regressions suited to short and sparse token histories. The results show substantial heterogeneity across asset categories. Gold-backed tokens exhibit broader holder bases and more persistent on-chain activity than many Treasury and private-credit-related products, while outstanding asset value alone does not reliably predict observed liquidity. The paper contributes to the literature by developing a clearer empirical measurement framework for real-world-asset liquidity and showing that tokenization and liquidity should be analyzed as distinct outcomes.

2026-05-31T10:05:25Z Rischan Mafrur