https://arxiv.org/api/fHFpxRRMlUDXKInkbChK+5dGqdo 2026-03-20T07:59:05Z 5183 0 15 http://arxiv.org/abs/2603.19211v1 Synthetic Control Misconceptions: Recommendations for Practice 2026-03-19T17:56:42Z To estimate the causal effect of an intervention, researchers need to identify a control group that represents what might have happened to the treatment group in the absence of that intervention. This is challenging without a randomized experiment and further complicated when few units (possibly only one) are treated. Nevertheless, when data are available on units over time, synthetic control (SC) methods provide an opportunity to construct a valid comparison by differentially weighting control units that did not receive the treatment so that their resulting pre-treatment trajectory is similar to that of the treated unit. The hope is that this weighted ``pseudo-counterfactual" can serve as a valid counterfactual in the post-treatment time period. Since its origin twenty years ago, SC has been used over 5,000 times in the literature (Web of Science, December 2025), leading to a proliferation of descriptions of the method and guidance on proper usage that is not always accurate and does not always align with what the original developers appear to have intended. As such, a number of accepted pieces of wisdom have arisen: (1) SC is robust to various implementations; (2) covariates are unnecessary, and (3) pre-treatment prediction error should guide model selection. We describe each in detail and conduct simulations that suggest, both for standard and alternative implementations of SC, that these purported truths are not supported by empirical evidence and thus actually represent misconceptions about best practice. Instead of relying on these misconceptions, we offer practical advice for more cautious implementation and interpretation of results. 2026-03-19T17:56:42Z Robert Pickett Jennifer Hill Sarah Cowan http://arxiv.org/abs/2507.04668v2 Forward Regression via Gram-Schmidt Orthogonalization for Ultra-High Dimensional Linear Models 2026-03-19T15:27:29Z Forward regression is a classical and effective tool for variable screening in ultra-high dimensional linear models, but its standard projection-based implementation can be computationally costly and numerically unstable when predictors are strongly collinear. Motivated by this limitation, we propose an orthogonalized forward regression procedure, implemented recursively through Gram-Schmidt updates, that ranks predictors according to their unique contributions after removing the effects of variables already selected. This approach preserves the interpretability of forward regression while substantially reducing the cost of repeated projections. We further develop a path-based model size selection rule using statistics computed directly from the forward sequence, thereby avoiding cross-validation and extensive tuning. The resulting method is particularly well suited to settings in which the number of predictors far exceeds the sample size and strong collinearity renders the conventional forward fitting ineffective. Theoretically, we derive the optimal convergence rate for the proposed Gram-Schmidt forward regression, thereby extending existing results for projection-based forward regression, and further show that it enjoys sure screening property and variable selection consistency under suitable conditions. Simulation studies and empirical examples demonstrate that it provides a favorable balance among computational efficiency, numerical stability, screening accuracy, and predictive performance, especially in highly correlated ultra-high dimensional settings. 2025-07-07T05:15:48Z Jialuo Chen Zhaoxing Gao Yifan Jiang Ruey S. Tsay http://arxiv.org/abs/2511.17928v2 Limit Theorems for Network Data without Metric Structure 2026-03-19T14:53:36Z This paper develops limit theorems for random variables with network dependence, without requiring the individuals in the network to be located in a Euclidean or metric space. This distinguishes our approach from most existing limit theorems in network statistics and econometrics, which are based on weak dependence concepts such as strong mixing, near-epoch dependence, or $ψ$-dependence. All these weak dependence concepts presuppose an underlying metric. By relaxing the assumption of an underlying metric space, our theorems can be applied to a broader range of network data, including financial and social networks. To derive the limit theorems, we generalize the concept of functional dependence (also known as physical dependence) from time series to random variables with network dependence. Using this framework, we establish several inequalities, a law of large numbers, and central limit theorems. Furthermore, we demonstrate the verifiability of our high-level conditions by deriving primitive sufficient conditions for spatial autoregressive models, which are widely used in network data analysis. 2025-11-22T05:56:33Z Wen Jiang Yachen Wang Zeqi Wu Xingbai Xu http://arxiv.org/abs/2603.18870v1 Inference in Regression Discontinuity Designs with Clustered Data 2026-03-19T13:15:49Z Clustered sampling is prevalent in empirical regression discontinuity (RD) designs, but it has not received much attention in the theoretical literature. In this paper, we introduce a general model-based framework for such settings and derive high-level conditions under which the standard local linear RD estimator is asymptotically normal. We verify that our high-level assumptions hold across a wide range of empirical designs, including settings of growing cluster sizes. We further show that clustered standard errors that are currently used in practice can be either inconsistent or overly conservative in finite samples. To address these issues, we propose a novel nearest-neighbor-type variance estimator and illustrate its properties in a diverse set of empirical applications. 2026-03-19T13:15:49Z Claudia Noack Tomasz Olma Christoph Rothe http://arxiv.org/abs/2603.17381v2 An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination 2026-03-19T03:53:04Z AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries. 2026-03-18T05:55:04Z 32 pages, no figure Minchul Shin http://arxiv.org/abs/2411.04380v2 Identification of Long-Term Treatment Effects via Temporal Links, Observational, and Experimental Data 2026-03-19T00:17:59Z Recent literature proposes combining short-term experimental and long-term observational data to provide alternatives to conventional observational studies for the identification of long-term average treatment effects (LTEs). This paper re-examines the identification problem and uncovers that assumptions restricting temporal link functions -- relationships between short-term and mean long-term potential outcomes -- are central in this context. The experimental data serve to amplify the identifying power of such assumptions; absent them, the combined data are no more informative than the observational data alone. Plausible inference thus hinges on justifiable restrictions in this class. Motivated by this, I introduce two treatment response assumptions that may be defensible based on economic theory or intuition. To utilize them and facilitate future developments, I develop a novel unifying identification framework that computationally produces sharp bounds on the LTE for a general class of temporal link function restrictions and accommodates imperfect experimental compliance -- thereby also extending existing approaches. I illustrate the method by estimating the long-term effects of Head Start participation. The findings indicate that the effects on educational attainment, employment, and criminal involvement are lasting but smaller in magnitude than those established by sibling comparisons. 2024-11-07T02:47:13Z Filip Obradović http://arxiv.org/abs/2201.06694v6 Homophily in preferences or meetings? Identifying and estimating an iterative network formation model 2026-03-18T23:44:18Z Is homophily in social and economic networks driven by a taste for homogeneity (preferences) or by a higher probability of meeting individuals with similar attributes (opportunity)? This paper studies identification and estimation of an iterative network game that distinguishes between these two mechanisms. Our approach enables us to assess the counterfactual effects of changing the meeting protocol between agents. As an application, we study the role of preferences and meetings in shaping classroom friendship networks in Brazil. In a network structure in which homophily due to preferences is stronger than homophily due to meeting opportunities, tracking students may improve welfare. Still, the relative benefit of this policy diminishes over the school semester. 2022-01-18T01:40:53Z Luis Alvarez Cristine Pinto Vladimir Ponczek http://arxiv.org/abs/2402.00584v5 Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models 2026-03-18T20:37:38Z The Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. It can be severely biased when the time series dimension of the data, $T$, is long. The source of the bias is the large degree of overidentification. We propose a simple two-step approach to deal with this problem. The first step applies LASSO to the cross-section data at each time period to select the most informative moment conditions, exploiting the approximately sparse structure of these conditions. The second step applies a linear instrumental variable estimator using the instruments constructed from the moment conditions selected in the first step. Using asymptotic sequences where the two dimensions of the panel grow with the sample size, we show that the new estimator is consistent and asymptotically normal under much weaker conditions on $T$ than the Arellano-Bond estimator. Our theory covers models with high-dimensional covariates including multiple lags of the dependent variable and strictly exogenous covariates, which are becoming common in modern applications. We illustrate our approach by applying it to weekly county-level panel data from the United States to study opening K-12 schools and other mitigation policies' short and long-term effects on COVID-19's spread. 2024-02-01T13:31:54Z Victor Chernozhukov Iván Fernández-Val Chen Huang Weining Wang http://arxiv.org/abs/2603.17881v1 Towards Measuring Disruptive Innovation Across Countries 2026-03-18T16:05:21Z The CD index is a widely used measure of disruptive inventions. Most studies compute it using USPTO data. This creates a puzzle because the US appears less disruptive than European and Asian countries. We show that this largely stems from missing international citations. Using a global citation network, we quantify and correct this bias. The disruptiveness advantage of non-US inventors drops by 64% to 148% of the US baseline mean. The US emerges as a disruption leader over Europe, with Asia's advantage substantially reduced. Globally integrated citation data are essential for credible measurement of disruptive innovation in international contexts. 2026-03-18T16:05:21Z Christian Rutzer Dragan Filimonovic Jeffrey T. Macher Rolf Weder http://arxiv.org/abs/2404.12882v4 The modified conditional sum-of-squares estimator for fractionally integrated models 2026-03-18T15:14:08Z In this paper, we analyse the influence of estimating a constant term on the bias of the conditional sum-of-squares (CSS) estimator in a stationary or non-stationary type-II ARFIMA ($p_1$,$d$,$p_2$) model. We derive expressions for the estimator's bias and show that the leading term can be easily removed by a simple modification of the CSS objective function. We call this new estimator the modified conditional sum-of-squares (MCSS) estimator. We show theoretically and by means of Monte Carlo simulations that its performance relative to that of the CSS estimator is markedly improved even for small sample sizes. Finally, we revisit three classical short datasets that have in the past been described by ARFIMA($p_1$,$d$,$p_2$) models with constant term, namely the post-second World War real GNP data, the extended Nelson-Plosser data, and the Nile data. 2024-04-19T13:30:50Z Mustafa R. Kılınç Michael Massmann http://arxiv.org/abs/2603.17463v1 Multivariate GARCH and portfolio variance prediction: A forecast reconciliation perspective 2026-03-18T08:10:41Z We assess the advantage of combining univariate and multivariate portfolio risk forecasts with the aid of forecast reconciliation techniques. In our analyzes, we assume knowledge of portfolio weights, a standard for portfolio risk management applications. With an extensive simulation experiment, we show that, if the true covariance is known, forecast reconciliation improves over a standard multivariate approach, in particular when the adopted multivariate model is misspecified. However, if noisy proxies are used, correctly specified models and the misspecified ones (for instance, neglecting spillovers) turn out to be, in several cases, indistinguishable, with forecast reconciliation still providing improvements. The noise in the covariance proxy plays a crucial role in determining the improvement of both the forecast reconciliation and the correct model specification. An empirical analysis shows how forecast reconciliation can be adopted with real data to improve traditional GARCH-based portfolio variance forecasts. 2026-03-18T08:10:41Z Massimiliano Caporin Daniele Girolimetto Emanuele Lopetuso http://arxiv.org/abs/2410.16017v2 Semiparametric Bayesian Inference for a Conditional Moment Equality Model 2026-03-18T03:10:50Z I propose a semiparametric Bayesian inference framework for conditional moment equalities. The core idea is that these models deterministically map a conditional distribution of data to a structural parameter via the restriction that a conditional expectation equals zero. Consequently, a posterior for the conditional distribution leads to a posterior for the structural parameter by minimizing the distance of the conditional moments to zero. The method has similar flexibility to frequentist semiparametric estimators and does not require converting the conditional moments into unconditional moments. I also establish frequentist asymptotic optimality of my proposal via a semiparametric Bernsteinvon Mises theorem (BvM), which establishes that the posterior for the structural parameter is asymptotically normal and matches the Chamberlain (1987) semiparametric efficiency bound. The BvM conditions are verified for Gaussian process priors and complement the numerical aspects of the paper in which these priors are used to estimate welfare effects. 2024-10-21T13:49:38Z Christopher D. Walker http://arxiv.org/abs/2509.11381v2 The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation 2026-03-17T19:59:56Z Recursive decision trees are widely used to estimate heterogeneous causal treatment effects in experimental and observational studies. These methods are typically implemented using CART-type recursive partitioning and are often viewed as adaptive procedures capable of discovering treatment effect heterogeneity in high-dimensional settings. We study causal tree estimators based on adaptive recursive partitioning and establish lower bounds on their estimation accuracy. Under basic conditions, we show that causal trees constructed via standard CART-type splitting rules cannot achieve polynomial-in-$n$ convergence rates in the uniform norm (where $n$ denotes the sample size). The underlying mechanism is that greedy recursive partitioning selects highly imbalanced splits with non-vanishing probability, producing terminal nodes containing very few observations and leading to large estimation variance. We further show that sample splitting (``honesty'') yields at most negligible improvements in convergence rates. As a consequence, causal tree estimators may converge arbitrarily slowly and can even be inconsistent in some settings. Our results also clarify the role of balanced partition assumptions in existing theoretical guarantees for causal forests and related ensemble methods. The analysis develops new probabilistic tools for studying adaptive recursive partitioning procedures, including non-asymptotic approximations for suprema of partial sums and Gaussian processes. As a technical by-product, we also identify and correct an error in Eicker (1979). 2025-09-14T18:29:45Z Matias D. Cattaneo Jason M. Klusowski Ruiqi Rae Yu http://arxiv.org/abs/2603.10272v2 An operator-level ARCH Model 2026-03-17T18:01:54Z AutoRegressive Conditional Heteroscedasticity (ARCH) models are standard for modeling time series exhibiting volatility, with a rich literature in univariate and multivariate settings. In recent years, these models have been extended to function spaces. However, functional ARCH and generalized ARCH (GARCH) processes established in the literature have thus far been restricted to model ``pointwise'' variances. In this paper, we propose a new ARCH framework for data residing in general separable Hilbert spaces that accounts for the full evolution of the conditional covariance operator. We define a general operator-level ARCH model. For a simplified Constant Conditional Correlation version of the model, we establish conditions under which such models admit strictly and weakly stationary solutions, finite moments, and weak serial dependence. Additionally, we derive consistent Yule--Walker-type estimators of the infinite-dimensional model parameters. The practical relevance of the model is illustrated through simulations and a data application to high-frequency cumulative intraday returns. 2026-03-10T23:04:20Z 48 pages, 8 Figures, 2 Tables Alexander Aue Sebastian Kühnert Gregory Rice Jeremy VanderDoes http://arxiv.org/abs/2603.08634v2 Tractable Identification of Strategic Network Formation Models with Unobserved Heterogeneity 2026-03-17T17:50:04Z We develop a tractable identification approach for strategic network formation models with both strategic link interdependence and individual unobserved heterogeneity (fixed effects). The key challenge is that endogenous network statistics (e.g. number of common friends) enter the link formation equation, while the mapping from model primitives to equilibrium network structure is generally intractable. Our approach sidesteps this difficulty using a ``bounding-by-$c$'' technique that treats endogenous covariates as random variables and exploits monotonicity restrictions to obtain identifying information. A central contribution is to develop a spectrum of fixed-effects handling strategies based on subnetwork configurations: tetrad-based restrictions that difference out all individual fixed effects, triad-based and weighted restrictions that combine ``difference-out'' and ``integrate-out'' steps by differencing out some fixed effects and profiling over the remainder conditional on observed characteristics, and general weighted cycle-based restrictions that unify these cases. We also provide point identification results. Preliminary simulations show that the approach can deliver informative bounds on the structural parameters. 2026-03-09T17:11:30Z Wayne Yuan Gao Ming Li Zhengyan Xu