The Subtle Interplay between Square-root Impact, Order Imbalance & Volatility: A Unifying Framework

2026-03-04T09:31:44Z

In this work, we aim to reconcile several apparently contradictory observations in market microstructure: is the famous "square-root law" of metaorder impact, which decays with time, compatible with the random-walk nature of prices and the linear impact of order imbalances? Can one entirely explain the volatility of prices as resulting from the flow of uninformed metaorders that mechanically impact them? We introduce a new theoretical framework to describe metaorders with different signs, sizes and durations, which all impact prices as a square-root of volume but with a subsequent time decay. We show that, as in the original propagator model, price diffusion is ensured by the long memory of cross-correlations between metaorders. In order to account for the effect of strongly fluctuating volumes q of individual trades, we further introduce two q-dependent exponents, which allow us to describe how the moments of generalized volume imbalance and the correlation between price changes and generalized order flow imbalance scale with T. We predict in particular that the corresponding power-laws depend in a non-monotonic fashion on a parameter a, which allows one to put the same weight on all child orders or to overweight large ones, a behaviour that is clearly borne out by empirical data. We also predict that the correlation between price changes and volume imbalances should display a maximum as a function of a, which again matches observations. Such noteworthy agreement between theory and data suggests that our framework correctly captures the basic mechanism at the heart of price formation, namely the average impact of metaorders. We argue that our results support the "Order-Driven" theory of excess volatility, and are at odds with the idea that a "Fundamental" component accounts for a large share of the volatility of financial markets.

Sentiment-Aware Mean-Variance Portfolio Optimization for Cryptocurrencies

2026-03-04T03:28:45Z

Cryptocurrency markets are highly volatile and influenced by both price trends and market sentiment, making effective portfolio management challenging. This paper proposes a dynamic cryptocurrency portfolio strategy that integrates technical indicators and sentiment analysis to enhance investment decision-making. Market momentum is captured using the 14-day Relative Strength Index (RSI) and Simple Moving Average (SMA), while sentiment signals are extracted from news articles with VADER and further validated using the Google Gemini large language model. These signals are incorporated into expected return estimates and used in a constrained mean-variance optimization framework. Backtesting across multiple cryptocurrencies shows that the integrated approach outperforms traditional benchmarks, including momentum strategy, Bitcoin Long-Short strategy, and an equal-weighted portfolio, achieving stronger risk-adjusted returns and more consistent cumulative growth. Furthermore, comparing the sentiment-only and technical-only strategies shows that incorporating sentiment information alongside technical indicators can lead to more consistent performance gains. However, the strategies exhibit substantial drawdowns that coincide with known periods of market stress, indicating that additional risk-management components are required to improve stability.

Range-Based Volatility Estimators for Monitoring Market Stress: Evidence from Local Food Price Data

2026-03-03T11:47:45Z

Range-based volatility estimators are widely used in financial econometrics to quantify risk and market stress, yet their application to local commodity markets remains limited. This paper shows how open-high--low-close (OHLC) volatility estimators can be adapted to monitor localized market distress across diverse development contexts, including conflict-affected settings, climate-exposed regions, remote and thinly traded markets, and import- and logistics-constrained urban hubs. Using monthly food price data from the World Bank's Real-Time Prices dataset, several volatility measures -- including the Parkinson, Garman-Klass, Rogers-Satchell, and Yang-Zhang estimators -- are constructed and evaluated against independently documented disruption timelines. Across settings, elevated volatility aligns with episodes linked to insecurity and market fragmentation, extreme weather and disaster shocks, policy and fuel-cost adjustments, and global supply-chain and trade disruptions. Volatility also detects stress that standard momentum indicators such as the relative strength index (RSI) can miss, including symmetric or rapidly reversing shocks in which offsetting supply and demand disturbances dampen net directional price movements while amplifying intra-period dispersion. Overall, OHLC-based volatility indicators provide a robust and interpretable signal of market disruptions and complement price-level monitoring for applications spanning financial risk, humanitarian early warning, and trade.

Forecasting the Evolving Composition of Inbound Tourism Demand: A Bayesian Compositional Time Series Approach Using Platform Booking Data

2026-03-02T20:04:55Z

Understanding how the composition of guest origin markets evolves over time is critical for destination marketing organizations, hospitality businesses, and tourism planners. We develop and apply Bayesian Dirichlet autoregressive moving average (BDARMA) models to forecast the compositional dynamics of guest origin market shares using proprietary Airbnb booking data spanning 2017--2025 across four major destination regions. Our analysis reveals substantial pandemic-induced structural breaks in origin composition, with heterogeneous recovery patterns across markets. In our analysis, the BDARMA framework achieves the lowest forecast error for EMEA and competitive performance across destination regions, outperforming standard benchmarks including naïve forecasts, exponential smoothing, and SARIMA on log-ratio transformed data in compositionally complex markets. For EMEA destinations, BDARMA achieves 27% lower forecast error than naïve methods ($p < 0.001$), with the greatest gains where multiple origin markets compete in the 5-25% share range. By modeling compositions directly on the simplex with a Dirichlet likelihood and incorporating seasonal variation in both mean and precision parameters, our approach produces coherent forecasts that respect the unit-sum constraint while capturing complex temporal dependencies. The methodology provides destination stakeholders with probabilistic forecasts of source market shares, enabling more informed strategic planning for marketing resource allocation, infrastructure investment, and crisis response.

Reasoning on Time-Series for Financial Technical Analysis

2026-03-02T08:07:02Z

While Large Language Models have been used to produce interpretable stock forecasts, they mainly focus on analyzing textual reports but not historical price data, also known as Technical Analysis. This task is challenging as it switches between domains: the stock price inputs and outputs lie in the time-series domain, while the reasoning step should be in natural language. In this work, we introduce Verbal Technical Analysis (VTA), a novel framework that combine verbal and latent reasoning to produce stock time-series forecasts that are both accurate and interpretable. To reason over time-series, we convert stock price data into textual annotations and optimize the reasoning trace using an inverse Mean Squared Error (MSE) reward objective. To produce time-series outputs from textual reasoning, we condition the outputs of a time-series backbone model on the reasoning-based attributes. Experiments on stock datasets across U.S., Chinese, and European markets show that VTA achieves state-of-the-art forecasting accuracy, while the reasoning traces also perform well on evaluation by industry experts.

Coupled Supply and Demand Forecasting in Platform Accommodation Markets

2026-02-28T02:44:21Z

Tourism demand forecasting is methodologically mature, but it typically treats accommodation supply as fixed or exogenous. In platform-mediated short-term rentals, supply is elastic, decision-driven, and co-evolves with demand through pricing, information design, and interventions. I reframe the core issue as endogenous stock-out censoring: realized booked nights satisfy B_{k,t} <= min(D_{k,t}, S_{k,t}), so booking models that ignore supply learn a regime-specific ceiling and become fragile under policy changes and supply shocks. This narrated review synthesizes work from tourism forecasting, revenue management, two-sided market economics, and Bayesian time-series methods; develops a three-part coupling framework (behavioral, informational, intervention); and illustrates the identification failure with a toy simulation. I conclude with a focused research agenda for jointly forecasting supply, demand, and their compositions.

Distributional Fitting and Tail Analysis of Lead-Time Compositions: Nights vs. Revenue on Airbnb

2026-02-27T21:39:10Z

We analyze daily lead-time distributions for two Airbnb demand metrics, Nights Booked (volume) and Gross Booking Value (revenue), treating each day's allocation across 0-365 days as a compositional vector. The data span 2,557 days from January 2019 through December 2025 in a large North American region. Three findings emerge. First, GBV concentrates more heavily in mid-range horizons: beyond 90 days, GBV tail mass typically exceeds Nights by 20-50%, with ratios reaching 75% at the 180-day threshold during peak seasons. Second, Gamma and Weibull distributions fit comparably well under interval-censored cross-entropy. Gamma wins on 61% of days for Nights and 52% for GBV, with Weibull close behind at 38% and 45%. Lognormal rarely wins (<3%). Nonparametric GAMs achieve 18-80x lower CRPS but sacrifice interpretability. Third, generalized Pareto fits suggest bounded tails for both metrics at thresholds below 150 days, though this may partly reflect right-truncation at 365 days; above 150 days, estimates destabilize. Bai-Perron tests with HAC standard errors identify five structural breaks in the Wasserstein distance series, with early breaks coinciding with COVID-19 disruptions. The results show that volume and revenue lead-time shapes diverge systematically, that simple two-parameter distributions capture daily pmfs adequately, and that tail inference requires care near truncation boundaries.

FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data

2026-02-27T07:28:04Z

Large language models (LLMs) excel at generating human-like responses but often struggle with interactive tasks that require access to real-time information. This limitation poses challenges in finance, where models must access up-to-date information, such as recent news or price movements, to support decision-making. To address this, we introduce Financial Agent, a knowledge-grounding approach for LLMs to handle financial queries using real-time text and tabular data. Our contributions are threefold: First, we develop a Financial Context Dataset of over 50,000 financial queries paired with the required context. Second, we develop FinBloom 7B, a custom 7 billion parameter LLM, by fine-tuning Bloom 7B on 14 million financial news articles from Reuters and Deutsche Presse-Agentur (DPA), alongside a random sample of 25% from 12 million Securities and Exchange Commission (SEC) filings. Third, we fine-tune FinBloom 7B using the Financial Context Dataset to serve as a Financial Agent. This agent generates relevant financial context, enabling efficient real-time data retrieval to answer user queries. By reducing latency and eliminating the need for users to manually provide accurate data, our approach significantly enhances the capability of LLMs to handle dynamic financial tasks. Our proposed approach makes real-time financial decisions, algorithmic trading and other related tasks streamlined, and is valuable in contexts with high-velocity data flows.

LLM as a Risk Manager: LLM Semantic Filtering for Lead-Lag Trading in Prediction Markets

2026-02-27T06:38:12Z

Prediction markets provide a unique setting where event-level time series are directly tied to natural-language descriptions, yet discovering robust lead-lag relationships remains challenging due to spurious statistical correlations. We propose a hybrid two-stage causal screener to address this challenge: (i) a statistical stage that uses Granger causality to identify candidate leader-follower pairs from market-implied probability time series, and (ii) an LLM-based semantic stage that re-ranks these candidates by assessing whether the proposed direction admits a plausible economic transmission mechanism based on event descriptions. Because causal ground truth is unobserved, we evaluate the ranked pairs using a fixed, signal-triggered trading protocol that maps relationship quality into realized profit and loss (PnL). On Kalshi Economics markets, our hybrid approach consistently outperforms the statistical baseline. Across rolling evaluations, the win rate increases from 51.4% to 54.5%. Crucially, the average magnitude of losing trades decreases substantially from 649 USD to 347 USD. This reduction is driven by the LLM's ability to filter out statistically fragile links that are prone to large losses, rather than relying on rare gains. These improvements remain stable across different trading configurations, indicating that the gains are not driven by specific parameter choices. Overall, the results suggest that LLMs function as semantic risk managers on top of statistical discovery, prioritizing lead-lag relationships that generalize under changing market conditions.

A Bayesian approach to out-of-sample network reconstruction

2026-02-25T12:53:31Z

Networks underpin systems that range from finance to biology, yet their structure is often only partially observed. Current reconstruction methods typically fit the parameters of a model anew to each snapshot, thus offering no guidance to predict future configurations. Here, we develop a Bayesian approach that uses the information about past network snapshots to inform a prior and predict the subsequent ones, while quantifying uncertainty. Instantiated with a single-parameter fitness model, our method infers link probabilities from node strengths and carries information forward in time. When applied to the Electronic Market for Interbank Deposit across the years 1999-2012, our method accurately recovers the number of connections per bank at subsequent times, outperforming probabilistic benchmarks designed for analogous, link prediction tasks. Notably, each predicted snapshot serves as a reliable prior for the next one, thus enabling self-sustained, out-of-sample reconstruction of evolving networks with a minimal amount of additional data.

When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

2026-02-25T11:45:18Z

We study same-source multi-view learning and adversarial robustness for next-day direction prediction using two deterministic, window-aligned image views derived from the same time series: an OHLCV-rendered chart (ohlcv) and a technical-indicator matrix (indic). To control label ambiguity from near-zero moves, we use an ex-post minimum-movement threshold min_move (tau) based on realized absolute next-day return, defining an offline benchmark on the subset where the absolute next-day return is at least tau. Under leakage-resistant time-block splits with embargo, we compare early fusion (channel stacking) and dual-encoder late fusion with optional cross-branch consistency. We then evaluate pixel-space L-infinity evasion attacks (FGSM/PGD) under view-constrained and joint threat models. We find that fusion is regime dependent: early fusion can suffer negative transfer under noisier settings, whereas late fusion is a more reliable default once labels stabilize. Robustness degrades sharply under tiny budgets with stable view-dependent vulnerabilities; late fusion often helps under view-constrained attacks, but joint perturbations remain challenging.

The Physics of Price Discovery: Deconvolving Information, Volatility, and the Critical Breakdown of Signal during Retail Herding

2026-02-24T16:51:42Z

How information transmits through prices -- and why this transmission breaks down -- remains poorly understood. We combine regularized deconvolution with Hawkes process analysis to study the impulse response structure of investor flows in the Korean equity market (January 2020 -- February 2025). Three findings emerge: foreign and institutional flows drive permanent price discovery while individual flows provide contrarian liquidity; individual investor surges are predominantly panic-driven and exhibit near-explosive self-excitation; and during herding episodes, institutional price impact deteriorates sharply in small-cap stocks while large-cap stocks maintain resilience. These results reframe market efficiency as a state variable -- conditional on both herding intensity and firm size -- rather than a structural constant.

Detecting and Explaining Unlawful Insider Trading: A Shapley Value and Causal Forest Approach to Identifying Key Drivers and Causal Relationships

2026-02-23T13:40:47Z

Corporate insiders trade for diverse reasons, often possessing Material Non-Public Information (MNPI). Determining whether specific trades leverage MNPI is a significant challenge due to inherent complexity. This study focuses on two critical objectives: accurately detecting Unlawful Insider Trading (UIT) and identifying key features explaining classification. The analysis demonstrates how combining Shapley Values (SHAP) and Causal Forest (CF) reveals these explanatory drivers. The findings underscore the necessity of causality in identifying and interpreting UIT, requiring the consideration of alternative scenarios and potential outcomes. Within a high-dimensional feature space, the proposed architecture integrates state-of-the-art techniques to achieve high classification accuracy. The framework provides robust feature rankings via SHAP and causal significance assessments through CF, facilitating the discovery of unique causal relationships. Statistically significant relationships are documented between the outcome and several key features, including director status, price-to-book ratio, return, and market beta. These features significantly influence the likelihood of UIT, suggesting potential links between insider behavior and factors such as information asymmetry, valuation risk, market volatility, and stock performance. The analysis draws attention to the complexities of financial causality, noting that while initial descriptors offer intuitive insights, deeper examination is required to understand nuanced impacts. These findings reaffirm the architectural flexibility of decision tree models. By incorporating heterogeneity during tree construction, these models effectively uncover latent structures within trade, finance, and governance data, characterizing fraudulent behavior while maintaining reliable results.

VOLatility Archive for Realized Estimates (VOLARE)

2026-02-23T11:31:53Z

VOLARE (VOLatility Archive for Realized Estimates - https://volare.unime.it) is an open research infrastructure providing standardized realized volatility and covariance measures constructed from ultra-high-frequency financial data. The platform processes tick-level observations across equities, exchange rates, and futures using an asset-specific pipeline that addresses heterogeneous trading calendars, microstructure noise, and timestamp precision. For equities, price series are cleaned using a documented outlier detection procedure and sampled at regular intervals. VOLARE delivers a comprehensive set of realized estimators, including realized variance, range-based measures, bipower variation, semivariances, realized quarticity, realized kernels, and multivariate covariance measures, ensuring methodological consistency and cross-asset comparability. In addition to bulk dataset download, the platform supports interactive visualization and real-time estimation of established volatility models such as HAR and MEM specifications.

Metaorder modelling and identification from public data

2026-02-23T08:28:46Z

Market-order flow in financial markets exhibits long-range correlations. This is a widely known stylised fact of financial markets. A popular hypothesis for this stylised fact comes from the Lillo-Mike-Farmer (LMF) order-splitting theory. However, quantitative tests of this theory have historically relied on proprietary datasets with trader identifiers, limiting reproducibility and cross-market validation. We show that the LMF theory can be validated using publicly available Johannesburg Stock Exchange (JSE) data by leveraging recently developed methods for reconstructing synthetic metaorders. We demonstrate the validation using 3 years of Transaction and Quote Data (TAQ) for the largest 100 stocks on the JSE when assuming that there are either N=50 or N=150 effective traders managing metaorders in the market.