https://arxiv.org/api/fHFpxRRMlUDXKInkbChK+5dGqdo2026-03-20T12:24:00Z5183015http://arxiv.org/abs/2603.19211v1Synthetic Control Misconceptions: Recommendations for Practice2026-03-19T17:56:42ZTo estimate the causal effect of an intervention, researchers need to identify a control group that represents what might have happened to the treatment group in the absence of that intervention. This is challenging without a randomized experiment and further complicated when few units (possibly only one) are treated. Nevertheless, when data are available on units over time, synthetic control (SC) methods provide an opportunity to construct a valid comparison by differentially weighting control units that did not receive the treatment so that their resulting pre-treatment trajectory is similar to that of the treated unit. The hope is that this weighted ``pseudo-counterfactual" can serve as a valid counterfactual in the post-treatment time period. Since its origin twenty years ago, SC has been used over 5,000 times in the literature (Web of Science, December 2025), leading to a proliferation of descriptions of the method and guidance on proper usage that is not always accurate and does not always align with what the original developers appear to have intended. As such, a number of accepted pieces of wisdom have arisen: (1) SC is robust to various implementations; (2) covariates are unnecessary, and (3) pre-treatment prediction error should guide model selection. We describe each in detail and conduct simulations that suggest, both for standard and alternative implementations of SC, that these purported truths are not supported by empirical evidence and thus actually represent misconceptions about best practice. Instead of relying on these misconceptions, we offer practical advice for more cautious implementation and interpretation of results.2026-03-19T17:56:42ZRobert PickettJennifer HillSarah Cowanhttp://arxiv.org/abs/2507.04668v2Forward Regression via Gram-Schmidt Orthogonalization for Ultra-High Dimensional Linear Models2026-03-19T15:27:29ZForward regression is a classical and effective tool for variable screening in ultra-high dimensional linear models, but its standard projection-based implementation can be computationally costly and numerically unstable when predictors are strongly collinear. Motivated by this limitation, we propose an orthogonalized forward regression procedure, implemented recursively through Gram-Schmidt updates, that ranks predictors according to their unique contributions after removing the effects of variables already selected. This approach preserves the interpretability of forward regression while substantially reducing the cost of repeated projections. We further develop a path-based model size selection rule using statistics computed directly from the forward sequence, thereby avoiding cross-validation and extensive tuning. The resulting method is particularly well suited to settings in which the number of predictors far exceeds the sample size and strong collinearity renders the conventional forward fitting ineffective. Theoretically, we derive the optimal convergence rate for the proposed Gram-Schmidt forward regression, thereby extending existing results for projection-based forward regression, and further show that it enjoys sure screening property and variable selection consistency under suitable conditions. Simulation studies and empirical examples demonstrate that it provides a favorable balance among computational efficiency, numerical stability, screening accuracy, and predictive performance, especially in highly correlated ultra-high dimensional settings.2025-07-07T05:15:48ZJialuo ChenZhaoxing GaoYifan JiangRuey S. Tsayhttp://arxiv.org/abs/2511.17928v2Limit Theorems for Network Data without Metric Structure2026-03-19T14:53:36ZThis paper develops limit theorems for random variables with network dependence, without requiring the individuals in the network to be located in a Euclidean or metric space. This distinguishes our approach from most existing limit theorems in network statistics and econometrics, which are based on weak dependence concepts such as strong mixing, near-epoch dependence, or $ψ$-dependence. All these weak dependence concepts presuppose an underlying metric. By relaxing the assumption of an underlying metric space, our theorems can be applied to a broader range of network data, including financial and social networks. To derive the limit theorems, we generalize the concept of functional dependence (also known as physical dependence) from time series to random variables with network dependence. Using this framework, we establish several inequalities, a law of large numbers, and central limit theorems. Furthermore, we demonstrate the verifiability of our high-level conditions by deriving primitive sufficient conditions for spatial autoregressive models, which are widely used in network data analysis.2025-11-22T05:56:33ZWen JiangYachen WangZeqi WuXingbai Xuhttp://arxiv.org/abs/2603.18870v1Inference in Regression Discontinuity Designs with Clustered Data2026-03-19T13:15:49ZClustered sampling is prevalent in empirical regression discontinuity (RD) designs, but it has not received much attention in the theoretical literature. In this paper, we introduce a general model-based framework for such settings and derive high-level conditions under which the standard local linear RD estimator is asymptotically normal. We verify that our high-level assumptions hold across a wide range of empirical designs, including settings of growing cluster sizes. We further show that clustered standard errors that are currently used in practice can be either inconsistent or overly conservative in finite samples. To address these issues, we propose a novel nearest-neighbor-type variance estimator and illustrate its properties in a diverse set of empirical applications.2026-03-19T13:15:49ZClaudia NoackTomasz OlmaChristoph Rothehttp://arxiv.org/abs/2603.17381v2An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination2026-03-19T03:53:04ZAI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.2026-03-18T05:55:04Z32 pages, no figureMinchul Shinhttp://arxiv.org/abs/2411.04380v2Identification of Long-Term Treatment Effects via Temporal Links, Observational, and Experimental Data2026-03-19T00:17:59ZRecent literature proposes combining short-term experimental and long-term observational data to provide alternatives to conventional observational studies for the identification of long-term average treatment effects (LTEs). This paper re-examines the identification problem and uncovers that assumptions restricting temporal link functions -- relationships between short-term and mean long-term potential outcomes -- are central in this context. The experimental data serve to amplify the identifying power of such assumptions; absent them, the combined data are no more informative than the observational data alone. Plausible inference thus hinges on justifiable restrictions in this class. Motivated by this, I introduce two treatment response assumptions that may be defensible based on economic theory or intuition. To utilize them and facilitate future developments, I develop a novel unifying identification framework that computationally produces sharp bounds on the LTE for a general class of temporal link function restrictions and accommodates imperfect experimental compliance -- thereby also extending existing approaches. I illustrate the method by estimating the long-term effects of Head Start participation. The findings indicate that the effects on educational attainment, employment, and criminal involvement are lasting but smaller in magnitude than those established by sibling comparisons.2024-11-07T02:47:13ZFilip Obradovićhttp://arxiv.org/abs/2201.06694v6Homophily in preferences or meetings? Identifying and estimating an iterative network formation model2026-03-18T23:44:18ZIs homophily in social and economic networks driven by a taste for homogeneity (preferences) or by a higher probability of meeting individuals with similar attributes (opportunity)? This paper studies identification and estimation of an iterative network game that distinguishes between these two mechanisms. Our approach enables us to assess the counterfactual effects of changing the meeting protocol between agents. As an application, we study the role of preferences and meetings in shaping classroom friendship networks in Brazil. In a network structure in which homophily due to preferences is stronger than homophily due to meeting opportunities, tracking students may improve welfare. Still, the relative benefit of this policy diminishes over the school semester.2022-01-18T01:40:53ZLuis AlvarezCristine PintoVladimir Ponczekhttp://arxiv.org/abs/2402.00584v5Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models2026-03-18T20:37:38ZThe Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. It can be severely biased when the time series dimension of the data, $T$, is long. The source of the bias is the large degree of overidentification. We propose a simple two-step approach to deal with this problem. The first step applies LASSO to the cross-section data at each time period to select the most informative moment conditions, exploiting the approximately sparse structure of these conditions. The second step applies a linear instrumental variable estimator using the instruments constructed from the moment conditions selected in the first step. Using asymptotic sequences where the two dimensions of the panel grow with the sample size, we show that the new estimator is consistent and asymptotically normal under much weaker conditions on $T$ than the Arellano-Bond estimator. Our theory covers models with high-dimensional covariates including multiple lags of the dependent variable and strictly exogenous covariates, which are becoming common in modern applications. We illustrate our approach by applying it to weekly county-level panel data from the United States to study opening K-12 schools and other mitigation policies' short and long-term effects on COVID-19's spread.2024-02-01T13:31:54ZVictor ChernozhukovIván Fernández-ValChen HuangWeining Wanghttp://arxiv.org/abs/2603.17881v1Towards Measuring Disruptive Innovation Across Countries2026-03-18T16:05:21ZThe CD index is a widely used measure of disruptive inventions. Most studies compute it using USPTO data. This creates a puzzle because the US appears less disruptive than European and Asian countries. We show that this largely stems from missing international citations. Using a global citation network, we quantify and correct this bias. The disruptiveness advantage of non-US inventors drops by 64% to 148% of the US baseline mean. The US emerges as a disruption leader over Europe, with Asia's advantage substantially reduced. Globally integrated citation data are essential for credible measurement of disruptive innovation in international contexts.2026-03-18T16:05:21ZChristian RutzerDragan FilimonovicJeffrey T. MacherRolf Wederhttp://arxiv.org/abs/2404.12882v4The modified conditional sum-of-squares estimator for fractionally integrated models2026-03-18T15:14:08ZIn this paper, we analyse the influence of estimating a constant term on the bias of the conditional sum-of-squares (CSS) estimator in a stationary or non-stationary type-II ARFIMA ($p_1$,$d$,$p_2$) model. We derive expressions for the estimator's bias and show that the leading term can be easily removed by a simple modification of the CSS objective function. We call this new estimator the modified conditional sum-of-squares (MCSS) estimator. We show theoretically and by means of Monte Carlo simulations that its performance relative to that of the CSS estimator is markedly improved even for small sample sizes. Finally, we revisit three classical short datasets that have in the past been described by ARFIMA($p_1$,$d$,$p_2$) models with constant term, namely the post-second World War real GNP data, the extended Nelson-Plosser data, and the Nile data.2024-04-19T13:30:50ZMustafa R. KılınçMichael Massmannhttp://arxiv.org/abs/2603.17463v1Multivariate GARCH and portfolio variance prediction: A forecast reconciliation perspective2026-03-18T08:10:41ZWe assess the advantage of combining univariate and multivariate portfolio risk forecasts with the aid of forecast reconciliation techniques. In our analyzes, we assume knowledge of portfolio weights, a standard for portfolio risk management applications. With an extensive simulation experiment, we show that, if the true covariance is known, forecast reconciliation improves over a standard multivariate approach, in particular when the adopted multivariate model is misspecified. However, if noisy proxies are used, correctly specified models and the misspecified ones (for instance, neglecting spillovers) turn out to be, in several cases, indistinguishable, with forecast reconciliation still providing improvements. The noise in the covariance proxy plays a crucial role in determining the improvement of both the forecast reconciliation and the correct model specification. An empirical analysis shows how forecast reconciliation can be adopted with real data to improve traditional GARCH-based portfolio variance forecasts.2026-03-18T08:10:41ZMassimiliano CaporinDaniele GirolimettoEmanuele Lopetusohttp://arxiv.org/abs/2410.16017v2Semiparametric Bayesian Inference for a Conditional Moment Equality Model2026-03-18T03:10:50ZI propose a semiparametric Bayesian inference framework for conditional moment equalities. The core idea is that these models deterministically map a conditional distribution of data to a structural parameter via the restriction that a conditional expectation equals zero. Consequently, a posterior for the conditional distribution leads to a posterior for the structural parameter by minimizing the distance of the conditional moments to zero. The method has similar flexibility to frequentist semiparametric estimators and does not require converting the conditional moments into unconditional moments. I also establish frequentist asymptotic optimality of my proposal via a semiparametric Bernsteinvon Mises theorem (BvM), which establishes that the posterior for the structural parameter is asymptotically normal and matches the Chamberlain (1987) semiparametric efficiency bound. The BvM conditions are verified for Gaussian process priors and complement the numerical aspects of the paper in which these priors are used to estimate welfare effects.2024-10-21T13:49:38ZChristopher D. Walkerhttp://arxiv.org/abs/2509.11381v2The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation2026-03-17T19:59:56ZRecursive decision trees are widely used to estimate heterogeneous causal treatment effects in experimental and observational studies. These methods are typically implemented using CART-type recursive partitioning and are often viewed as adaptive procedures capable of discovering treatment effect heterogeneity in high-dimensional settings. We study causal tree estimators based on adaptive recursive partitioning and establish lower bounds on their estimation accuracy. Under basic conditions, we show that causal trees constructed via standard CART-type splitting rules cannot achieve polynomial-in-$n$ convergence rates in the uniform norm (where $n$ denotes the sample size). The underlying mechanism is that greedy recursive partitioning selects highly imbalanced splits with non-vanishing probability, producing terminal nodes containing very few observations and leading to large estimation variance. We further show that sample splitting (``honesty'') yields at most negligible improvements in convergence rates. As a consequence, causal tree estimators may converge arbitrarily slowly and can even be inconsistent in some settings. Our results also clarify the role of balanced partition assumptions in existing theoretical guarantees for causal forests and related ensemble methods. The analysis develops new probabilistic tools for studying adaptive recursive partitioning procedures, including non-asymptotic approximations for suprema of partial sums and Gaussian processes. As a technical by-product, we also identify and correct an error in Eicker (1979).2025-09-14T18:29:45ZMatias D. CattaneoJason M. KlusowskiRuiqi Rae Yuhttp://arxiv.org/abs/2603.10272v2An operator-level ARCH Model2026-03-17T18:01:54ZAutoRegressive Conditional Heteroscedasticity (ARCH) models are standard for modeling time series exhibiting volatility, with a rich literature in univariate and multivariate settings. In recent years, these models have been extended to function spaces. However, functional ARCH and generalized ARCH (GARCH) processes established in the literature have thus far been restricted to model ``pointwise'' variances. In this paper, we propose a new ARCH framework for data residing in general separable Hilbert spaces that accounts for the full evolution of the conditional covariance operator. We define a general operator-level ARCH model. For a simplified Constant Conditional Correlation version of the model, we establish conditions under which such models admit strictly and weakly stationary solutions, finite moments, and weak serial dependence. Additionally, we derive consistent Yule--Walker-type estimators of the infinite-dimensional model parameters. The practical relevance of the model is illustrated through simulations and a data application to high-frequency cumulative intraday returns.2026-03-10T23:04:20Z48 pages, 8 Figures, 2 TablesAlexander AueSebastian KühnertGregory RiceJeremy VanderDoeshttp://arxiv.org/abs/2603.08634v2Tractable Identification of Strategic Network Formation Models with Unobserved Heterogeneity2026-03-17T17:50:04ZWe develop a tractable identification approach for strategic network formation models with both strategic link interdependence and individual unobserved heterogeneity (fixed effects). The key challenge is that endogenous network statistics (e.g. number of common friends) enter the link formation equation, while the mapping from model primitives to equilibrium network structure is generally intractable. Our approach sidesteps this difficulty using a ``bounding-by-$c$'' technique that treats endogenous covariates as random variables and exploits monotonicity restrictions to obtain identifying information. A central contribution is to develop a spectrum of fixed-effects handling strategies based on subnetwork configurations: tetrad-based restrictions that difference out all individual fixed effects, triad-based and weighted restrictions that combine ``difference-out'' and ``integrate-out'' steps by differencing out some fixed effects and profiling over the remainder conditional on observed characteristics, and general weighted cycle-based restrictions that unify these cases. We also provide point identification results. Preliminary simulations show that the approach can deliver informative bounds on the structural parameters.2026-03-09T17:11:30ZWayne Yuan GaoMing LiZhengyan Xu