https://arxiv.org/api/fHFpxRRMlUDXKInkbChK+5dGqdo 2026-07-17T19:56:16Z 5620 0 15 http://arxiv.org/abs/2512.02203v4 Statistical Inference in Large Multi-way Networks 2026-07-16T13:31:53Z

We propose the Polyads estimator, a new method to estimate structural parameters in weighted multi-way networks while controlling for rich, arbitrary structures of fixed effects. The method is based on a series of classification tasks and is agnostic to both the number and structure of fixed effects. Unlike full maximum likelihood, our estimator does not suffer from the incidental parameter problem: it is consistent and satisfies a Central Limit Theorem with no asymptotic bias, even when some dimensions of the network are short. For sparsely connected networks, it is also computationally faster than PPML. We provide experimental evidence that our estimator yields more reliable confidence intervals, i.e., better empirical coverage, than PPML and its bias-correction strategies. These improvements hold even under model misspecification and are more pronounced in sparse settings. While PPML remains competitive in dense, low-dimensional data, our approach offers a robust alternative for multi-way models that scales efficiently with sparsity. We apply the method to French health insurance claims data to study how a 2017 physician fee reform affected the geography and gender composition of doctor-patient connections.

2025-12-01T20:50:13Z Working paper Lucas Resende Guillaume Lecué Lionel Wilner Philippe Choné http://arxiv.org/abs/2607.14825v1 Aggregation Bias in Proxy Measurement: Nighttime Lights and Local Economic Activity 2026-07-16T10:40:44Z

This paper studies when high-resolution signals aggregated to administrative units can recover unobserved local economic activity. We develop a reverse-regression framework for signals generated by activity but used to predict it at coarser spatial supports. The main theorem decomposes predictive elasticity into elementary elasticity, reverse-regression attenuation, and a spatial aggregation term driven by unit size and within-unit dispersion, showing aggregation pulls elasticities toward one. Monte Carlo evidence confirms the decomposition and clarifies transferability conditions. Applications to VIIRS nighttime lights and local GDP or income in Brazil, Italy, the United States, Indonesia, and Kenya support local calibration mainly in richer contexts.

2026-07-16T10:40:44Z Davide Fiaschi Angela Parenti Cristiano Ricci http://arxiv.org/abs/2512.06804v3 Making Event Study Plots Honest: A Functional Data Approach to Causal Inference 2026-07-16T10:28:20Z

Event study plots are the centerpiece of Difference-in-Differences (DiD) analysis, but current plotting methods cannot provide honest causal inference when the parallel trends and/or no-anticipation assumptions fail. We introduce a novel functional data approach to DiD that directly enables honest causal inference via event study plots. Our DiD estimator converges to a Gaussian process in the Banach space of continuous functions, enabling powerful simultaneous confidence bands. This theoretical contribution allows us to turn an event study plot into a rigorous honest causal inference tool through equivalence and relevance testing: Honest reference bands can be validated using equivalence testing in the pre-treatment period, and honest causal effects can be tested using relevance testing in the post-treatment period. We demonstrate the performance of our method in simulations and two case studies.

2025-12-07T11:44:32Z Chencheng Fang Dominik Liebl http://arxiv.org/abs/2603.11497v2 Variance Estimation with Dependence and Heterogeneous Means 2026-07-16T02:23:21Z

This paper develops a framework for variance estimation under dependence and heterogeneous means. This paper shows that consistent estimation of the variance target is impossible in general, and characterizes necessary and sufficient conditions for conservative variance estimation using dual cones. To choose among the valid estimators, this paper formulates three criteria -- minimal correction, pointwise level estimand, and pointwise MSE -- and shows how an eigenvalue truncation solution is optimal under all three criteria. This characterization and solution allow us to assess if existing variance estimators are valid and optimal in their respective settings, and construct the first optimal variance estimator that is simultaneously robust to heterogeneous means and cross-cluster serial correlation.

2026-03-12T03:34:57Z Luther Yap http://arxiv.org/abs/2607.14414v1 Probability of worthwhile effect of monotone-response treatments 2026-07-15T23:05:05Z

Experiments may, by design, prevent one from observing on a single subject both the response to a treatment and to its absence. Because of this, marginal distributions for both cases may be observable but not their joint distribution, thus obscuring the distribution of the treatment effect. We examine the case where we impose that the treatment effect is nonnegative, also called monotone treatment response, a common assumption relevant to many practical applications. We solve the problems of best- and worst-case probabilities that the treatment effect exceeds a given value, using an explicit construction for the dependence scheme in each case. Such problems can equivalently be described, in different contexts, as risk aggregation under dependence uncertainty and an order constraint, and as optimal transport with a particular cost function.

2026-07-15T23:05:05Z Benjamin Côté Ruodu Wang http://arxiv.org/abs/2607.14279v1 From Vector Autoregressions to AI-based Time Series Forecasting: A Review 2026-07-15T18:38:21Z

Forecasting is a central goal of time-series analysis. This review centers on three major developments in recent AI-based time-series forecasting: transformers, large pretrained models for zero-shot forecasting, and diffusion-based generative forecasters. We connect these methods to the econometric tradition built around the vector autoregression (VAR) through a common object: the conditional distribution of the future given the past. The review is organized around three long-standing challenges: \emph{high dimensionality}, \emph{nonstationarity}, and \emph{nonlinearity}. We argue that modern methods make progress by expanding the classical forecasting template: they allow more flexible dynamics, use larger information sets and training corpora, and represent richer predictive distributions. Yet they often lack the inferential and structural tools that make classical models useful for testing, explanation, and policy analysis. We close by outlining open problems where econometric tools remain important.

2026-07-15T18:38:21Z Likai Chen Weining Wang http://arxiv.org/abs/2607.14274v1 Model Uncertainty under Non-Gaussian Errors: Bayesian Model Averaging and Selection in Stochastic Frontier Models 2026-07-15T18:30:49Z

The paper investigates Bayesian Model Averaging and Selection (BMA/S) under non-standard stochastic assumptions, focusing on stochastic frontier analysis (SFA). We propose fast, reliable procedures for inference in the normal-exponential stochastic frontier model and examine whether accounting for asymmetric disturbances affects model averaging and/or selection outcomes relative to the conventional Gaussian-error BMA/S. Particular attention is given to moderate-dimensional covariate selection problems typical in SFA applications. We demonstrate that, with appropriate search strategies and parallelization techniques, exhaustive model search can be computationally feasible and, in some cases, more practical than stochastic search alternatives. A Monte Carlo simulation study is used to compare the proposed SF-BMA/S procedure with standard Gaussian-error BMA/S under varying levels of inefficiency-to-noise ratio and signal strength with respect to the data generating process. The results show that accounting for stochastic frontier structures may affect posterior inference and model averaging outcomes, especially in scenarios where efficiency analysis is most sensible.

2026-07-15T18:30:49Z 23 pages, 6 tables, 2 figure, 1 appendix (2 tables) Kamil Makieła http://arxiv.org/abs/2601.07664v2 Crypto Pricing with Hidden Factors 2026-07-15T15:42:36Z

We estimate risk premia in the cross-section of cryptocurrency returns using the Giglio-Xiu (2021) three-pass approach, allowing for omitted latent factors alongside observed stock-market and crypto-market factors. Using weekly data on a broad universe of large cryptocurrencies, we find that crypto expected returns load on both crypto-specific factors and selected equity-industry factors associated with technology and profitability, consistent with increased integration between crypto and traditional markets. In addition, we study non-tradable state variables capturing investor sentiment (Fear and Greed), speculative rotation (Altcoin Season Index), and security shocks (hacked value scaled by market capitalization), which are new to the literature. Relative to conventional Fama-MacBeth estimates, the latent-factor approach yields materially different premia for key factors, highlighting the importance of controlling for unobserved risks in crypto asset pricing.

2026-01-12T15:43:30Z Matthew Brigida http://arxiv.org/abs/2607.13916v1 Detecting unusual trading patterns on cryptocurrency exchanges by means of complexity measures 2026-07-15T14:55:34Z

Artificial transaction generation remains an important source of potential market manipulation on cryptocurrency exchanges, as it may distort reported liquidity and reduce market transparency. This study proposes a diagnostic framework for detecting unusual trading patterns based on complexity and statistical-structure measures derived from high-frequency trade-level data. The analysis considers log-returns, trading volume, and transaction counts, using tail distributions, autocorrelation functions, multifractal characteristics, approximate entropy, and detrended cross-correlations. The methodology is applied to BTC, ETH, and XRP traded on Binance, Bitget, KuCoin, and Kraken over the period from April 1 to June 30, 2025. The results reveal a pronounced anomaly on Bitget for BTC and ETH after mid-May 2025. The number of transactions increases sharply, but there is no proportional increase in traded volume or return fluctuations. This regime is characterised by numerous low-volume trades, weaker autocorrelations, reduced multifractal organisation, higher short-pattern irregularity, and weaker cross-correlations involving the transaction-count series. These features are consistent with a noise-like component in trading activity and may indicate artificially increased transaction counts, although they do not provide direct proof of wash trading. The findings show that complexity-based indicators can be useful for detecting exchange-specific trading anomalies that remain hidden in price-based measures.

2026-07-15T14:55:34Z Entropy 2026, 28(7), 804 Jakub Zwydak Marcin Wątorek Jarosław Kwapień Stanisław Drożdż 10.3390/e28070804 http://arxiv.org/abs/2607.13879v1 Global factors for local shocks in a data-scarce environment: with an application to regional fiscal multipliers in Italy 2026-07-15T14:28:46Z

We propose a novel econometric methodology for Structural Vector Autoregressions with external instruments (`proxy-SVARs' or `SVAR-IVs') in panel data characterized by strong cross-sectional dependence, dynamic heterogeneity, and limited availability of direct external instruments for the shocks of interest. For each unit, we specify a Factor-Augmented proxy-SVAR (`proxy-FA-SVAR') that incorporates factors summarizing cross-sectional information from the non-policy variables of the system. The effects of the policy shocks are then recovered indirectly by estimating unit-specific policy reaction functions through a Minimum Distance approach. Identification relies on global instruments for the non-policy shocks; that is, proxies common to all units in the panel, internally constructed from a separate SVAR estimated on factors for the policy and non-policy variables. These global instruments can be complemented with local (idiosyncratic) instruments constructed from auxiliary unit-level SVARs. Their joint use renders the proxy-FA-SVARs overidentified and therefore statistically testable. We illustrate the methodology by estimating government spending multipliers for Italian NUTS-2 regions using annual data. The global and local instruments for the regional output shocks are obtained from Blanchard-Perotti-type SVARs.

2026-07-15T14:28:46Z Giuseppe Cavaliere Luca Fanelli Marco Mazzali http://arxiv.org/abs/2511.01680v4 Making Interpretable Discoveries from Unstructured Data: A High-Dimensional Multiple Hypothesis Testing Approach 2026-07-15T14:09:50Z

Social scientists are increasingly turning to unstructured datasets to unlock new empirical insights, e.g., estimating descriptive statistics of or causal effects on quantitative measures derived from text, audio, or video data. In many settings, unsupervised analysis is of primary interest, in that the researcher does not want to (or cannot) manually pre-specify all important aspects of the unstructured data to measure; they are interested in "discovery." This paper proposes a general and flexible framework for pursuing such discovery from unstructured data in a statistically principled way. The framework leverages recent methods from the literature on AI interpretability to map unstructured data points to high-dimensional, sparse, and interpretable "concept embeddings"; computes statistics from these concept embeddings for testing interpretable, concept-by-concept hypotheses; performs selective inference on these hypotheses using algorithms validated by new results in high-dimensional central limit theory, producing a selected set ("discoveries"); and both generates and evaluates human-interpretable natural language descriptions of these discoveries. The proposed framework has few researcher degrees of freedom, is robust to data snooping and other post-selection inference concerns, and facilitates fast and inexpensive sensitivity analysis and replication. Applications to recent descriptive and causal analyses of unstructured data in empirical economics are explored.

2025-11-03T15:42:32Z Jacob Carlson http://arxiv.org/abs/2607.13862v1 Estimation and Inference for Latent Dual Networks Using High-Dimensional IV Screening 2026-07-15T14:07:26Z

We develop a novel methodology for estimation and inference in high-dimensional panel network models with latent dual structures. The framework allows outcomes to be affected simultaneously by positive and negative interaction channels, accommodating settings in which some interactions reinforce outcomes while others generate competition and displacement effects. The proposed method identifies and estimates the network directly from the structural model using observed data without the need to pre-specify the network. Network recovery is achieved through a sequential instrumental-variable screening procedure. We establish exact support recovery and oracle-equivalent post-selection inference. An application to U.S. corporate leverage data reveals the coexistence of reinforcing and displacement interactions in firms' financial decisions.

2026-07-15T14:07:26Z Arturas Juodis George Kapetanios Vasilis Sarafidis http://arxiv.org/abs/2501.11996v4 Experimental Designs for Multi-Item Multi-Period Inventory Control 2026-07-15T13:54:47Z

Randomized experiments, or A/B testing, are the gold standard for evaluating interventions, yet they remain underutilized in inventory management. This study addresses this gap by analyzing A/B testing strategies in multi-item, multi-period inventory systems with lost sales and capacity constraints. We examine two canonical experimental designs--switchback experiments and item-level randomization--and show that both suffer from systematic bias due to interference: temporal carryover in switchbacks and cannibalization across items under capacity constraints. Under mild conditions, we characterize the direction of this bias in different scenarios. Motivated by two-sided randomization, we propose a pairwise design over items and time and analyze its bias properties. Controlled stochastic simulations verify the theoretical predictions, and trace-driven experiments on real-world fresh-retail data show that the same mechanisms persist in realistic environments with stockout substitution.

2025-01-21T09:37:14Z Accepted by the 27th ACM Conference on Economics and Computation (EC '26) Xinqi Chen Xingyu Bai Zeyu Zheng Nian Si http://arxiv.org/abs/2607.13759v1 Time preference effects in forecasting 2026-07-15T12:20:14Z

We study the evaluation of forecasts regarding the timing and occurrence of uncertain future events, such as volcanic eruptions, the start of a war or the beginning of a recession. We show theoretically that a typical approach -- evaluating the forecasts after the event occurred -- incentivizes dishonest predictions if forecasters discount future rewards in favor of more immediate benefits. An empirical application to forecasting tournament data finds strong empirical evidence that forecasters adjust predictions in response to these incentives, implying that existing forecasts of such events are likely systematically overstating the probability of early occurrence. We conclude that rewarding such forecasts in an incentive-compatible way is inherently challenging.

2026-07-15T12:20:14Z Yannick Hoga Niklas V. Lehmann http://arxiv.org/abs/2607.13564v1 Manipulation testing based on Benford's Law for discrete scores 2026-07-15T08:04:52Z

This paper addresses the problem of running variable manipulation in Regression Discontinuity Designs. Leveraging the observation that manipulation often alters the density balance around the cutoff, we detect these structural imbalances using Benford's Law -a natural statistical regularity widely applied in fraud detection. Our framework serves as a vital precautionary safeguard alongside traditional McCrary-type tests. It eliminates researcher-chosen parameters that can skew outcomes, while delivering a deeper diagnostic breakdown of the density's behavior. Crucially, whereas the classic McCrary test can overlook systemic imbalances due to its rigid symmetric setup, our method separates the data into directional components. This allows researchers to pinpoint the exact origin of a deviation and spot hidden manipulation that standard frameworks fail to capture. To achieve this, we introduce an innovative method for selecting a bandwidth consistent with BL, and construct two distinct, complementary tests using threshold values adapted from Nigrini (2012) that successfully transition the law's application from digits to probabilities. Empirical applications confirm the enhanced protective value of this diagnostic framework.

2026-07-15T08:04:52Z 5 figures, 22 pages Roy Cerqueti Marco Ventura