https://arxiv.org/api/0XmJScqIZQMUUlnY967hfit7vHU 2026-03-18T11:46:54Z 2295 30 15 http://arxiv.org/abs/2603.00738v1 Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning 2026-02-28T17:05:39Z This paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods. 2026-02-28T17:05:39Z 36 pages Sebastien Lleo Wolfgang Runggaldier http://arxiv.org/abs/2508.09429v2 Optimal Control of Reserve Asset Portfolios for Stablecoins 2026-02-28T05:14:38Z Stablecoins promise par convertibility, yet issuers must balance immediate liquidity against yield on reserves to keep the peg credible. We study this treasury problem as a continuous-time control task with two instruments: reallocating reserves between cash and short-duration government bills, and setting a spread fee for either minting or burning the coin. Mint and redemption flows follow mutually exciting processes that reproduce clustered order flow. Peg deviations arise when immediate cash coverage is insufficient relative to outstanding supply, and the market price relaxes toward this liquidity-coverage fair value. We develop a stochastic model predictive control framework that incorporates moment closure for event intensities. Using Pontryagin's Maximum Principle, we show that the optimal reallocation control exhibits a soft-thresholding structure: no rebalancing occurs when the shadow-cost differential lies within a deadzone set by transaction costs, and reallocation scales linearly beyond that threshold up to a capacity-imposed saturation limit. Introducing settlement windows leads to a sampled-data implementation with a simple threshold (soft-thresholding) structure for rebalancing. We also establish a monotone stress-response property: as expected outflows intensify or windows lengthen, the optimal policy shifts predictably toward cash. In simulations covering various stress test scenarios, the controller preserves most bill carry in calm markets, builds cash quickly when stress emerges, and avoids unnecessary rotations under transitory signals. The proposed policy is implementation-ready and aligns naturally with operational cut-offs. Our results translate empirical flow risk into auditable treasury rules that improve peg quality without sacrificing avoidable carry. 2025-08-13T02:07:35Z Alexander Hammerl http://arxiv.org/abs/0903.2243v5 Pragmatic Information Rates, Generalizations of the Kelly Criterion, and Financial Market Efficiency 2026-02-28T02:51:13Z This paper is part of an ongoing investigation of "pragmatic information", defined in Weinberger (2002) as "the amount of information actually used in making a decision". Because a study of information rates led to the Noiseless and Noisy Coding Theorems, two of the most important results of Shannon's theory, we begin the paper by defining a pragmatic information rate, showing that all of the relevant limits make sense, and interpreting them as the improvement in compression obtained from using the correct distribution of transmitted symbols. The first of two applications of the theory extends the information theoretic analysis of the Kelly Criterion, and its generalization, the horse race, to a series of races where the stochastic process of winning horses, payoffs, and strategies depend on some stationary process, including, but not limited to the history of previous races. If the bettor is receiving messages (side information) about the probability distribution of winners, the doubling rate of the bettor's winnings is bounded by the pragmatic information of the messages. A second application is to the question of market efficiency. An efficient market is, by definition, a market in which the pragmatic information of the "tradable past" with respect to current prices is zero. Under this definition, markets whose returns are characterized by a GARCH(1,1) process cannot be efficient. Finally, a pragmatic informational analogue to Shannon's Noisy Coding Theorem suggests that a cause of market inefficiency is that the underlying fundamentals are changing so fast that the price discovery mechanism simply cannot keep up. This may happen most readily in the run-up to a financial bubble, where investors' willful ignorance degrade the information processing capabilities of the market. 2009-03-12T18:27:02Z The fundamental formula for pragmatic information is true only in the special case where the a priori probabilities q(m) are average of the joint probabilities p(omega, m) over all incoming messages m. Also, the efficient market hypothesis (EMH) can still be true in a GARCH model, so the discussion of the EMH is confused Edward D. Weinberger http://arxiv.org/abs/2603.00361v1 Market Dynamics of Information Avalanches 2026-02-27T22:58:34Z Financial markets convert the incremental arrival of information into asset price changes. In a sandpile model grains of sand represent bits of data, and the size of an avalanche, governed by a scaling law, is linked to price volatility. While this model of self-organized criticality reproduces stylized facts, it also identifies a structural tension between the non-arbitrage condition and price adjustments consistent with a constant Sharpe ratio. 2026-02-27T22:58:34Z 5 pages Bernhard K Meister http://arxiv.org/abs/2602.22069v1 Pools as Portfolios: Observed arbitrage efficiency & LVR analysis of dynamic weight AMMs 2026-02-25T16:16:49Z Dynamic-weight AMMs (aka Temporal Function Market Makers, TFMMs) implement algorithmic asset allocation, analogous to index or smart beta funds, by continuously updating pools' weights. A strategy updates target weights over time, and arbitrageurs trade the pool back toward those weights. This creates a sequence of small, predictable mispricings that grow until taken, effectively executing rebalances as a series of Dutch reverse auctions. Prior theoretical and simulation work (Willetts & Harrington, 2024) predicted that this mechanism could outperform CEX-style rebalancing. We test that claim on two live pools on the QuantAMM protocol, one on Ethereum mainnet and one on Base, across two short rebalancing windows six months apart (July 2025 and January 2026). We perform block-level arbitrage analysis, and then measure long term outcomes using Loss-vs-Rebalancing (LVR) and Rebalancing-vs-Rebalancing (RVR) benchmarks. On mainnet, rebalancing becomes markedly more efficient over time (more frequent arbitrage trades with lower value extracted per trade), reaching performance comparable to or better than CEX-based models. On Base, rebalancing persists even when per-trade extraction is near (or below) zero, consistent with routing-driven execution, and achieves efficiencies that meet or exceed standard "perfect rebalancing" LVR baselines. These results demonstrate dynamic-weight AMMs as a competitive execution layer for tokenised funds, with superior performance on L2s where routing and lower data costs compress arbitrage spreads. 2026-02-25T16:16:49Z 9 pages plus appendix Matthew Willetts Christian Harrington http://arxiv.org/abs/2602.21173v1 Bayesian Parametric Portfolio Policies 2026-02-24T18:18:26Z Parametric Portfolio Policies (PPP) estimate optimal portfolio weights directly as functions of observable signals by maximizing expected utility, bypassing the need to model asset returns and covariances. However, PPP ignores policy risk. We show that this is consequential, leading to an overstatement of expected utility and an understatement of portfolio risk. We develop Bayesian Parametric Portfolio Policies (BPPP), which place a prior on policy coefficients thereby correcting the decision rule. We derive a general result showing that the utility gap between PPP and BPPP is strictly positive and proportional to posterior parameter uncertainty and signal magnitude. Under a mean--variance approximation, this correction appears as an additional estimation-risk term in portfolio variance, implying that PPP overexposes when signals are strongest and when risk aversion is high. Empirically, in a high-dimensional setting with 242 signals and six factors over 1973--2023, BPPP delivers higher Sharpe ratios, substantially lower turnover, larger investor welfare, and lower tail risk, with advantages that increase monotonically in risk aversion and are strongest during crisis episodes. 2026-02-24T18:18:26Z Miguel C. Herculano http://arxiv.org/abs/2603.13252v1 When Alpha Breaks: Two-Level Uncertainty for Safe Deployment of Cross-Sectional Stock Rankers 2026-02-24T14:02:24Z Cross-sectional ranking models are often deployed as if point predictions were sufficient: the model outputs scores and the portfolio follows the induced ordering. Under non-stationarity, rankers can fail during regime shifts. In the AI Stock Forecaster, a LightGBM ranker performs well overall at a 20-day horizon, yet the 2024 holdout coincides with an AI thematic rally and sector rotation that breaks the signal at longer horizons and weakens 20d. This motivates treating deployment as two decisions: (i) whether the strategy should trade at all, and (ii) how to control risk within active trades. We adapt Direct Epistemic Uncertainty Prediction (DEUP) to ranking by predicting rank displacement and defining an epistemic uncertainty signal ehat relative to a point-in-time (PIT-safe) baseline. Empirically, ehat is structurally coupled with signal strength (median correlation between ehat and absolute score is about 0.6 across 1,865 dates), so inverse-uncertainty sizing de-levers the strongest signals and degrades performance. To address this, we propose a two-level deployment policy: a strategy-level regime-trust gate G(t) that decides whether to trade (AUROC around 0.72 overall and 0.75 in FINAL) and a position-level epistemic tail-risk cap that reduces exposure only for the most uncertain predictions. The operational policy, trade only when G(t) is at least 0.2, apply volatility sizing on active dates, and cap the top epistemic tail, improves risk-adjusted performance in the 20d policy comparison and indicates DEUP adds value mainly as a tail-risk guard rather than a continuous sizing denominator. 2026-02-24T14:02:24Z 34 pages, 14 tables. Cross-sectional equity ranking; uncertainty-based abstention and tail-risk capping under regime shifts Ursina Sanderink http://arxiv.org/abs/2602.20856v1 Stochastic Discount Factors with Cross-Asset Spillovers 2026-02-24T12:58:01Z This paper develops a unified framework that links firm-level predictive signals, cross-asset spillovers, and the stochastic discount factor (SDF). Signals and spillovers are jointly estimated by maximizing the Sharpe ratio, yielding an interpretable SDF that both ranks characteristic relevance and uncovers the direction of predictive influence across assets. Out-of-sample, the SDF consistently outperforms self-predictive and expected-return benchmarks across investment universes and market states. The inferred information network highlights large, low-turnover firms as net transmitters. The framework offers a clear, economically grounded view of the informational architecture underlying cross-sectional return dynamics. 2026-02-24T12:58:01Z Doron Avramov Xin He http://arxiv.org/abs/2507.23138v4 Is Causality Necessary for Efficient Portfolios? A Computational Perspective on Predictive Validity and Model Misspecification 2026-02-23T11:36:08Z Portfolio optimization is increasingly argued to require causally identified return predictors to avoid signal inversion and optimization failure. This paper re-examines this claim by studying when predictive signals yield viable efficient frontiers, even under structural misspecification. We show that causal identification is not necessary for portfolio efficiency within static mean--variance and closely related quadratic portfolio optimization frameworks. Instead, efficiency is governed by geometric sufficiency conditions on predictive signals: directional alignment, ranking preservation, and calibration. We formally decompose portfolio efficiency into these three components and show that miscalibration alone attenuates Sharpe ratios even when alignment and ranking are preserved. Robustness is characterized as smooth degradation rather than collapse, with explicit attenuation behavior and continuity of performance under increasing misspecification. The theoretical results are supported by simulations and empirical analysis. Empirical validation combines equity-based illustrations with a large global bond universe spanning multiple currencies, countries, sectors, maturities along the term structure, seniority classes, and credit ratings, together with high-dimensional stress tests, nonlinear data-generating processes, rolling-window analyses, covariance regularization, realistic portfolio constraints, and bootstrap-based statistical validation. Across these settings, optimization geometry remains well-behaved whenever directional alignment is preserved. The results clarify the boundary between causality and portfolio optimization: causality may inform signal representation, but portfolio efficiency at the optimization stage is a geometric property conditional on a given representation. 2025-07-30T22:37:09Z 38 Pages, 13 Figures, 9 Tables Alejandro Rodriguez Dominguez http://arxiv.org/abs/2602.18912v1 Overreaction as an indicator for momentum in algorithmic trading: A Case of AAPL stocks 2026-02-21T17:31:02Z This paper investigates whether short-term market overreactions can be systematically predicted and monetized as momentum signals using high-frequency emotional information and modern machine learning methods. Focusing on Apple Inc. (AAPL), we construct a comprehensive intraday dataset that combines volatility normalized returns with transformer-based emotion features extracted from Twitter messages. Overreactions are defined as extreme return realizations relative to contemporaneous volatility and transaction costs and are modeled as a three-class prediction problem. We evaluate the performance of several nonlinear classifiers, including XGBoost, Random Forests, Deep Neural Networks, and Bidirectional LSTMs, across multiple intraday frequencies (1, 5, 10, and 15 minute data). Model outputs are translated into trading strategies and assessed using risk-adjusted performance measures and formal statistical tests. The results show that machine learning models significantly outperform benchmark overreaction rules at ultra short horizons, while classical behavioral momentum effects dominate at intermediate frequencies, particularly around 10 minutes. Explainability analysis based on SHAP reveals that volatility and negative emotions, especially fear and sadness, play a central role in driving predicted overreactions. Overall, the findings demonstrate that emotion-driven overreactions contain a predictable structure that can be exploited by machine learning models, offering new insights into the behavioral origins of intraday momentum and the interaction between sentiment, volatility, and algorithmic trading. 2026-02-21T17:31:02Z Szymon Lis Robert Ślepaczuk Paweł Sakowski http://arxiv.org/abs/2603.04441v1 Explainable Regime Aware Investing 2026-02-21T00:33:16Z We propose an explainable regime-aware portfolio construction framework based on a strictly causal Wasserstein Hidden Markov Model. The model combines rolling Gaussian HMM inference with predictive model-order selection and template-based identity tracking using the 2-Wasserstein distance between Gaussian components. This allows regime complexity to adapt dynamically while preserving stable economic interpretation. Regime probabilities are embedded into a transaction-cost-aware mean-variance optimization framework and evaluated on a diversified daily cross-asset universe. Relative to equal-weight and SPX buy-and-hold benchmarks, the Wasserstein HMM achieves materially higher risk-adjusted performance with Sharpe ratios of 2.18 versus 1.59 and 1.18 and substantially lower maximum drawdown of negative 5.43 percent versus negative 14.62 percent for SPX. During the early 2025 equity selloff labeled Liberation Day, the strategy dynamically reduced equity exposure and shifted toward defensive assets, mitigating peak-to-trough losses. Compared to a nonparametric KNN conditional-moment estimator using the same features and optimization layer, the parametric regime model produces materially lower turnover and smoother weight evolution. The results demonstrate that regime inference stability, particularly identity preservation and adaptive complexity control, is a first-order determinant of portfolio drawdown and implementation robustness in daily asset allocation. 2026-02-21T00:33:16Z Amine Boukardagha http://arxiv.org/abs/2602.18157v1 Time consistent portfolio strategies for a general utility function 2026-02-20T11:48:38Z We study the Merton portfolio management problem within a complete market, non constant time discount rate and general utility framework. The non constant discount rate introduces time inconsistency which can be solved by introducing sub game perfect strategies. Under some asymptotic assumptions on the utility function, we show that the subgame perfect strategy is the same as the optimal strategy, provided the discount rate is replaced by the utility weighted discount rate $ρ(t,x)$ that depends on the time $t$ and wealth level $x$. A fixed point iteration is used to find $ρ$. The consumption to wealth ratio and the investment to wealth ratio are given in feedback form as functions of the value function. 2026-02-20T11:48:38Z Oumar Mbodji http://arxiv.org/abs/2403.10273v2 Optimal Portfolio Choice with Cross-Impact Propagators 2026-02-19T13:25:54Z We consider a class of optimal portfolio choice problems in continuous time where the agent's transactions create both transient cross-impact driven by a matrix-valued Volterra propagator, as well as temporary price impact. We formulate this problem as the maximization of a revenue-risk functional, where the agent also exploits available information on a progressively measurable price predicting signal. We solve the maximization problem explicitly in terms of operator resolvents, by reducing the corresponding first order condition to a coupled system of stochastic Fredholm equations of the second kind and deriving its solution. We then give sufficient conditions on the matrix-valued propagator so that the model does not permit price manipulation. We also provide an implementation of the solutions to the optimal portfolio choice problem and to the associated optimal execution problem. Our solutions yield financial insights on the influence of cross-impact on the optimal strategies and its interplay with alpha decays. 2024-03-15T13:05:03Z 37 pages, 7 figures Eduardo Abi Jaber Eyal Neuman Sturmius Tuschmann http://arxiv.org/abs/2602.17098v1 Deep Reinforcement Learning for Optimal Portfolio Allocation: A Comparative Study with Mean-Variance Optimization 2026-02-19T05:47:23Z Portfolio Management is the process of overseeing a group of investments, referred to as a portfolio, with the objective of achieving predetermined investment goals. Portfolio optimization is a key component that involves allocating the portfolio assets so as to maximize returns while minimizing risk taken. It is typically carried out by financial professionals who use a combination of quantitative techniques and investment expertise to make decisions about the portfolio allocation. Recent applications of Deep Reinforcement Learning (DRL) have shown promising results when used to optimize portfolio allocation by training model-free agents on historical market data. Many of these methods compare their results against basic benchmarks or other state-of-the-art DRL agents but often fail to compare their performance against traditional methods used by financial professionals in practical settings. One of the most commonly used methods for this task is Mean-Variance Portfolio Optimization (MVO), which uses historical time series information to estimate expected asset returns and covariances, which are then used to optimize for an investment objective. Our work is a thorough comparison between model-free DRL and MVO for optimal portfolio allocation. We detail the specifics of how to make DRL for portfolio optimization work in practice, also noting the adjustments needed for MVO. Backtest results demonstrate strong performance of the DRL agent across many metrics, including Sharpe ratio, maximum drawdowns, and absolute returns. 2026-02-19T05:47:23Z 9 pages, 6 figures. Published at the FinPlan'23 Workshop, the 33rd International Conference on Automated Planning and Scheduling (ICAPS 2023) Srijan Sood Kassiani Papasotiriou Marius Vaiciulis Tucker Balch http://arxiv.org/abs/2602.16862v1 Entropy Regularization as Robustness under Bayesian Drift Uncertainty 2026-02-18T20:42:32Z We study entropy-regularized mean-variance portfolio optimization under Bayesian drift uncertainty. Gaussian policies remain optimal under partial information, the value function is quadratic in wealth, and belief-dependent coefficients admit closed-form solutions. The mean control is identical to deterministic Bayesian Markowitz feedback; entropy regularization affects only the policy variance. Additionally, this variance does not affect information gain, and instead provides belief-dependent robustness. Notably, optimal policy variance increases with posterior conviction $|m_t|$, forcing greater action randomization when mean position is most aggressive. 2026-02-18T20:42:32Z 25 pages, 2 figures Andy Au