https://arxiv.org/api/0XmJScqIZQMUUlnY967hfit7vHU2026-03-18T11:46:54Z22953015http://arxiv.org/abs/2603.00738v1Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning2026-02-28T17:05:39ZThis paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods.2026-02-28T17:05:39Z36 pagesSebastien LleoWolfgang Runggaldierhttp://arxiv.org/abs/2508.09429v2Optimal Control of Reserve Asset Portfolios for Stablecoins2026-02-28T05:14:38ZStablecoins promise par convertibility, yet issuers must balance immediate liquidity against yield on reserves to keep the peg credible. We study this treasury problem as a continuous-time control task with two instruments: reallocating reserves between cash and short-duration government bills, and setting a spread fee for either minting or burning the coin. Mint and redemption flows follow mutually exciting processes that reproduce clustered order flow. Peg deviations arise when immediate cash coverage is insufficient relative to outstanding supply, and the market price relaxes toward this liquidity-coverage fair value. We develop a stochastic model predictive control framework that incorporates moment closure for event intensities. Using Pontryagin's Maximum Principle, we show that the optimal reallocation control exhibits a soft-thresholding structure: no rebalancing occurs when the shadow-cost differential lies within a deadzone set by transaction costs, and reallocation scales linearly beyond that threshold up to a capacity-imposed saturation limit. Introducing settlement windows leads to a sampled-data implementation with a simple threshold (soft-thresholding) structure for rebalancing. We also establish a monotone stress-response property: as expected outflows intensify or windows lengthen, the optimal policy shifts predictably toward cash. In simulations covering various stress test scenarios, the controller preserves most bill carry in calm markets, builds cash quickly when stress emerges, and avoids unnecessary rotations under transitory signals. The proposed policy is implementation-ready and aligns naturally with operational cut-offs. Our results translate empirical flow risk into auditable treasury rules that improve peg quality without sacrificing avoidable carry.2025-08-13T02:07:35ZAlexander Hammerlhttp://arxiv.org/abs/0903.2243v5Pragmatic Information Rates, Generalizations of the Kelly Criterion, and Financial Market Efficiency2026-02-28T02:51:13ZThis paper is part of an ongoing investigation of "pragmatic information", defined in Weinberger (2002) as "the amount of information actually used in making a decision". Because a study of information rates led to the Noiseless and Noisy Coding Theorems, two of the most important results of Shannon's theory, we begin the paper by defining a pragmatic information rate, showing that all of the relevant limits make sense, and interpreting them as the improvement in compression obtained from using the correct distribution of transmitted symbols.
The first of two applications of the theory extends the information theoretic analysis of the Kelly Criterion, and its generalization, the horse race, to a series of races where the stochastic process of winning horses, payoffs, and strategies depend on some stationary process, including, but not limited to the history of previous races. If the bettor is receiving messages (side information) about the probability distribution of winners, the doubling rate of the bettor's winnings is bounded by the pragmatic information of the messages.
A second application is to the question of market efficiency. An efficient market is, by definition, a market in which the pragmatic information of the "tradable past" with respect to current prices is zero. Under this definition, markets whose returns are characterized by a GARCH(1,1) process cannot be efficient.
Finally, a pragmatic informational analogue to Shannon's Noisy Coding Theorem suggests that a cause of market inefficiency is that the underlying fundamentals are changing so fast that the price discovery mechanism simply cannot keep up. This may happen most readily in the run-up to a financial bubble, where investors' willful ignorance degrade the information processing capabilities of the market.2009-03-12T18:27:02ZThe fundamental formula for pragmatic information is true only in the special case where the a priori probabilities q(m) are average of the joint probabilities p(omega, m) over all incoming messages m. Also, the efficient market hypothesis (EMH) can still be true in a GARCH model, so the discussion of the EMH is confusedEdward D. Weinbergerhttp://arxiv.org/abs/2603.00361v1Market Dynamics of Information Avalanches2026-02-27T22:58:34ZFinancial markets convert the incremental arrival of information into asset price changes. In a sandpile model grains of sand represent bits of data, and the size of an avalanche, governed by a scaling law, is linked to price volatility. While this model of self-organized criticality reproduces stylized facts, it also identifies a structural tension between the non-arbitrage condition and price adjustments consistent with a constant Sharpe ratio.2026-02-27T22:58:34Z5 pagesBernhard K Meisterhttp://arxiv.org/abs/2602.22069v1Pools as Portfolios: Observed arbitrage efficiency & LVR analysis of dynamic weight AMMs2026-02-25T16:16:49ZDynamic-weight AMMs (aka Temporal Function Market Makers, TFMMs) implement algorithmic asset allocation, analogous to index or smart beta funds, by continuously updating pools' weights. A strategy updates target weights over time, and arbitrageurs trade the pool back toward those weights. This creates a sequence of small, predictable mispricings that grow until taken, effectively executing rebalances as a series of Dutch reverse auctions. Prior theoretical and simulation work (Willetts & Harrington, 2024) predicted that this mechanism could outperform CEX-style rebalancing. We test that claim on two live pools on the QuantAMM protocol, one on Ethereum mainnet and one on Base, across two short rebalancing windows six months apart (July 2025 and January 2026). We perform block-level arbitrage analysis, and then measure long term outcomes using Loss-vs-Rebalancing (LVR) and Rebalancing-vs-Rebalancing (RVR) benchmarks. On mainnet, rebalancing becomes markedly more efficient over time (more frequent arbitrage trades with lower value extracted per trade), reaching performance comparable to or better than CEX-based models. On Base, rebalancing persists even when per-trade extraction is near (or below) zero, consistent with routing-driven execution, and achieves efficiencies that meet or exceed standard "perfect rebalancing" LVR baselines. These results demonstrate dynamic-weight AMMs as a competitive execution layer for tokenised funds, with superior performance on L2s where routing and lower data costs compress arbitrage spreads.2026-02-25T16:16:49Z9 pages plus appendixMatthew WillettsChristian Harringtonhttp://arxiv.org/abs/2602.21173v1Bayesian Parametric Portfolio Policies2026-02-24T18:18:26ZParametric Portfolio Policies (PPP) estimate optimal portfolio weights directly as functions of observable signals by maximizing expected utility, bypassing the need to model asset returns and covariances. However, PPP ignores policy risk. We show that this is consequential, leading to an overstatement of expected utility and an understatement of portfolio risk. We develop Bayesian Parametric Portfolio Policies (BPPP), which place a prior on policy coefficients thereby correcting the decision rule. We derive a general result showing that the utility gap between PPP and BPPP is strictly positive and proportional to posterior parameter uncertainty and signal magnitude. Under a mean--variance approximation, this correction appears as an additional estimation-risk term in portfolio variance, implying that PPP overexposes when signals are strongest and when risk aversion is high. Empirically, in a high-dimensional setting with 242 signals and six factors over 1973--2023, BPPP delivers higher Sharpe ratios, substantially lower turnover, larger investor welfare, and lower tail risk, with advantages that increase monotonically in risk aversion and are strongest during crisis episodes.2026-02-24T18:18:26ZMiguel C. Herculanohttp://arxiv.org/abs/2603.13252v1When Alpha Breaks: Two-Level Uncertainty for Safe Deployment of Cross-Sectional Stock Rankers2026-02-24T14:02:24ZCross-sectional ranking models are often deployed as if point predictions were sufficient: the model outputs scores and the portfolio follows the induced ordering. Under non-stationarity, rankers can fail during regime shifts. In the AI Stock Forecaster, a LightGBM ranker performs well overall at a 20-day horizon, yet the 2024 holdout coincides with an AI thematic rally and sector rotation that breaks the signal at longer horizons and weakens 20d. This motivates treating deployment as two decisions: (i) whether the strategy should trade at all, and (ii) how to control risk within active trades. We adapt Direct Epistemic Uncertainty Prediction (DEUP) to ranking by predicting rank displacement and defining an epistemic uncertainty signal ehat relative to a point-in-time (PIT-safe) baseline. Empirically, ehat is structurally coupled with signal strength (median correlation between ehat and absolute score is about 0.6 across 1,865 dates), so inverse-uncertainty sizing de-levers the strongest signals and degrades performance. To address this, we propose a two-level deployment policy: a strategy-level regime-trust gate G(t) that decides whether to trade (AUROC around 0.72 overall and 0.75 in FINAL) and a position-level epistemic tail-risk cap that reduces exposure only for the most uncertain predictions. The operational policy, trade only when G(t) is at least 0.2, apply volatility sizing on active dates, and cap the top epistemic tail, improves risk-adjusted performance in the 20d policy comparison and indicates DEUP adds value mainly as a tail-risk guard rather than a continuous sizing denominator.2026-02-24T14:02:24Z34 pages, 14 tables. Cross-sectional equity ranking; uncertainty-based abstention and tail-risk capping under regime shiftsUrsina Sanderinkhttp://arxiv.org/abs/2602.20856v1Stochastic Discount Factors with Cross-Asset Spillovers2026-02-24T12:58:01ZThis paper develops a unified framework that links firm-level predictive signals, cross-asset spillovers, and the stochastic discount factor (SDF). Signals and spillovers are jointly estimated by maximizing the Sharpe ratio, yielding an interpretable SDF that both ranks characteristic relevance and uncovers the direction of predictive influence across assets. Out-of-sample, the SDF consistently outperforms self-predictive and expected-return benchmarks across investment universes and market states. The inferred information network highlights large, low-turnover firms as net transmitters. The framework offers a clear, economically grounded view of the informational architecture underlying cross-sectional return dynamics.2026-02-24T12:58:01ZDoron AvramovXin Hehttp://arxiv.org/abs/2507.23138v4Is Causality Necessary for Efficient Portfolios? A Computational Perspective on Predictive Validity and Model Misspecification2026-02-23T11:36:08ZPortfolio optimization is increasingly argued to require causally identified return predictors to avoid signal inversion and optimization failure. This paper re-examines this claim by studying when predictive signals yield viable efficient frontiers, even under structural misspecification. We show that causal identification is not necessary for portfolio efficiency within static mean--variance and closely related quadratic portfolio optimization frameworks. Instead, efficiency is governed by geometric sufficiency conditions on predictive signals: directional alignment, ranking preservation, and calibration. We formally decompose portfolio efficiency into these three components and show that miscalibration alone attenuates Sharpe ratios even when alignment and ranking are preserved. Robustness is characterized as smooth degradation rather than collapse, with explicit attenuation behavior and continuity of performance under increasing misspecification. The theoretical results are supported by simulations and empirical analysis. Empirical validation combines equity-based illustrations with a large global bond universe spanning multiple currencies, countries, sectors, maturities along the term structure, seniority classes, and credit ratings, together with high-dimensional stress tests, nonlinear data-generating processes, rolling-window analyses, covariance regularization, realistic portfolio constraints, and bootstrap-based statistical validation. Across these settings, optimization geometry remains well-behaved whenever directional alignment is preserved. The results clarify the boundary between causality and portfolio optimization: causality may inform signal representation, but portfolio efficiency at the optimization stage is a geometric property conditional on a given representation.2025-07-30T22:37:09Z38 Pages, 13 Figures, 9 TablesAlejandro Rodriguez Dominguezhttp://arxiv.org/abs/2602.18912v1Overreaction as an indicator for momentum in algorithmic trading: A Case of AAPL stocks2026-02-21T17:31:02ZThis paper investigates whether short-term market overreactions can be systematically predicted and monetized as momentum signals using high-frequency emotional information and modern machine learning methods. Focusing on Apple Inc. (AAPL), we construct a comprehensive intraday dataset that combines volatility normalized returns with transformer-based emotion features extracted from Twitter messages. Overreactions are defined as extreme return realizations relative to contemporaneous volatility and transaction costs and are modeled as a three-class prediction problem. We evaluate the performance of several nonlinear classifiers, including XGBoost, Random Forests, Deep Neural Networks, and Bidirectional LSTMs, across multiple intraday frequencies (1, 5, 10, and 15 minute data). Model outputs are translated into trading strategies and assessed using risk-adjusted performance measures and formal statistical tests. The results show that machine learning models significantly outperform benchmark overreaction rules at ultra short horizons, while classical behavioral momentum effects dominate at intermediate frequencies, particularly around 10 minutes. Explainability analysis based on SHAP reveals that volatility and negative emotions, especially fear and sadness, play a central role in driving predicted overreactions. Overall, the findings demonstrate that emotion-driven overreactions contain a predictable structure that can be exploited by machine learning models, offering new insights into the behavioral origins of intraday momentum and the interaction between sentiment, volatility, and algorithmic trading.2026-02-21T17:31:02ZSzymon LisRobert ŚlepaczukPaweł Sakowskihttp://arxiv.org/abs/2603.04441v1Explainable Regime Aware Investing2026-02-21T00:33:16ZWe propose an explainable regime-aware portfolio construction framework based on a strictly causal Wasserstein Hidden Markov Model. The model combines rolling Gaussian HMM inference with predictive model-order selection and template-based identity tracking using the 2-Wasserstein distance between Gaussian components. This allows regime complexity to adapt dynamically while preserving stable economic interpretation. Regime probabilities are embedded into a transaction-cost-aware mean-variance optimization framework and evaluated on a diversified daily cross-asset universe. Relative to equal-weight and SPX buy-and-hold benchmarks, the Wasserstein HMM achieves materially higher risk-adjusted performance with Sharpe ratios of 2.18 versus 1.59 and 1.18 and substantially lower maximum drawdown of negative 5.43 percent versus negative 14.62 percent for SPX. During the early 2025 equity selloff labeled Liberation Day, the strategy dynamically reduced equity exposure and shifted toward defensive assets, mitigating peak-to-trough losses. Compared to a nonparametric KNN conditional-moment estimator using the same features and optimization layer, the parametric regime model produces materially lower turnover and smoother weight evolution. The results demonstrate that regime inference stability, particularly identity preservation and adaptive complexity control, is a first-order determinant of portfolio drawdown and implementation robustness in daily asset allocation.2026-02-21T00:33:16ZAmine Boukardaghahttp://arxiv.org/abs/2602.18157v1Time consistent portfolio strategies for a general utility function2026-02-20T11:48:38ZWe study the Merton portfolio management problem within a complete market, non constant time discount rate and general utility framework. The non constant discount rate introduces time inconsistency which can be solved by introducing sub game perfect strategies. Under some asymptotic assumptions on the utility function, we show that the subgame perfect strategy is the same as the optimal strategy, provided the discount rate is replaced by the utility weighted discount rate $ρ(t,x)$ that depends on the time $t$ and wealth level $x$. A fixed point iteration is used to find $ρ$. The consumption to wealth ratio and the investment to wealth ratio are given in feedback form as functions of the value function.2026-02-20T11:48:38ZOumar Mbodjihttp://arxiv.org/abs/2403.10273v2Optimal Portfolio Choice with Cross-Impact Propagators2026-02-19T13:25:54ZWe consider a class of optimal portfolio choice problems in continuous time where the agent's transactions create both transient cross-impact driven by a matrix-valued Volterra propagator, as well as temporary price impact. We formulate this problem as the maximization of a revenue-risk functional, where the agent also exploits available information on a progressively measurable price predicting signal. We solve the maximization problem explicitly in terms of operator resolvents, by reducing the corresponding first order condition to a coupled system of stochastic Fredholm equations of the second kind and deriving its solution. We then give sufficient conditions on the matrix-valued propagator so that the model does not permit price manipulation. We also provide an implementation of the solutions to the optimal portfolio choice problem and to the associated optimal execution problem. Our solutions yield financial insights on the influence of cross-impact on the optimal strategies and its interplay with alpha decays.2024-03-15T13:05:03Z37 pages, 7 figuresEduardo Abi JaberEyal NeumanSturmius Tuschmannhttp://arxiv.org/abs/2602.17098v1Deep Reinforcement Learning for Optimal Portfolio Allocation: A Comparative Study with Mean-Variance Optimization2026-02-19T05:47:23ZPortfolio Management is the process of overseeing a group of investments, referred to as a portfolio, with the objective of achieving predetermined investment goals. Portfolio optimization is a key component that involves allocating the portfolio assets so as to maximize returns while minimizing risk taken. It is typically carried out by financial professionals who use a combination of quantitative techniques and investment expertise to make decisions about the portfolio allocation.
Recent applications of Deep Reinforcement Learning (DRL) have shown promising results when used to optimize portfolio allocation by training model-free agents on historical market data. Many of these methods compare their results against basic benchmarks or other state-of-the-art DRL agents but often fail to compare their performance against traditional methods used by financial professionals in practical settings. One of the most commonly used methods for this task is Mean-Variance Portfolio Optimization (MVO), which uses historical time series information to estimate expected asset returns and covariances, which are then used to optimize for an investment objective.
Our work is a thorough comparison between model-free DRL and MVO for optimal portfolio allocation. We detail the specifics of how to make DRL for portfolio optimization work in practice, also noting the adjustments needed for MVO. Backtest results demonstrate strong performance of the DRL agent across many metrics, including Sharpe ratio, maximum drawdowns, and absolute returns.2026-02-19T05:47:23Z9 pages, 6 figures. Published at the FinPlan'23 Workshop, the 33rd International Conference on Automated Planning and Scheduling (ICAPS 2023)Srijan SoodKassiani PapasotiriouMarius VaiciulisTucker Balchhttp://arxiv.org/abs/2602.16862v1Entropy Regularization as Robustness under Bayesian Drift Uncertainty2026-02-18T20:42:32ZWe study entropy-regularized mean-variance portfolio optimization under Bayesian drift uncertainty. Gaussian policies remain optimal under partial information, the value function is quadratic in wealth, and belief-dependent coefficients admit closed-form solutions. The mean control is identical to deterministic Bayesian Markowitz feedback; entropy regularization affects only the policy variance. Additionally, this variance does not affect information gain, and instead provides belief-dependent robustness. Notably, optimal policy variance increases with posterior conviction $|m_t|$, forcing greater action randomization when mean position is most aggressive.2026-02-18T20:42:32Z25 pages, 2 figuresAndy Au