https://arxiv.org/api/r0XawnMgX+jGICeofK6pDjp3S+s 2026-03-20T10:45:35Z 5183 15 15 http://arxiv.org/abs/2603.16729v1 GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems 2026-03-17T16:12:30Z Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and a local certification radius, derived from the decoder Jacobian and a Lipschitz bound, quantifies the geometric robustness of efficiency scores. We validate GeMA on synthetic data with non-convex frontiers, heterogeneous technologies and scale bias, and on four real-world case studies: global urban rail systems (COMET), British rail operators (ORR), national economies (Penn World Table) and a high-frequency wind-farm dataset. Across these domains GeMA behaves comparably to established methods when classical assumptions hold, and provides additional insight in settings with pronounced heterogeneity, non-convexity or size-related bias. 2026-03-17T16:12:30Z Latent manifold frontiers for benchmarking complex production systems, and applications to national rail operators, wind farms, and macroeconomic productivity are presented Jia Ming Li Anupriya Daniel J. Graham http://arxiv.org/abs/2504.03228v9 Weak instrumental variables due to nonlinearities in panel data: A Super Learner Control Function estimator 2026-03-17T12:21:15Z A triangular structural panel data model with additive separable individual-specific effects is used to model the causal effect of a covariate on an outcome variable when there are unobservable confounders with some of them time-invariant. In this setup, a linear reduced-form equation might be problematic when the conditional mean of the endogenous covariate and the instrumental variables is nonlinear. The reason is that ignoring the nonlinearity could lead to weak instruments (instruments are weakly correlated with the endogenous covariate). As a solution, we propose a triangular simultaneous equation model for panel data with additive separable individual-specific fixed effects composed of a linear structural equation with a nonlinear reduced form equation. The parameter of interest is the structural parameter of the endogenous variable. The identification of this parameter is obtained under the assumption of available exclusion restrictions and using a control function approach. Estimating the parameter of interest is done using an estimator that we call Super Learner Control Function (SLCF) estimator. The estimation procedure is composed of two main steps and sample splitting. First, we estimate the control function using a super learner . In the following step, we use the estimated control function to control for endogeneity in the structural equation. Sample splitting is done across the individual dimension. The estimator is consistent and asymptotically normal achieving a parametric rate of convergence. We perform a Monte Carlo simulation to test the performance of the estimators proposed. We conclude that the Super Learner Control Function Estimators significantly outperform Within 2SLS estimators. Finally, we show that the SLCF estimator differs from both the plug-in IV estimator and a naive plug-in 2SLS estimator. 2025-04-04T07:22:18Z Monika Avila-Marquez http://arxiv.org/abs/2512.07709v2 Bounds on inequality with incomplete data 2026-03-17T11:35:03Z We develop a unified nonparametric framework for sharp partial identification and inference on inequality indices when the data contain coarsened observations of the variable of interest. We characterize the extremal allocations for all Schur-convex inequality measures, and show that sharp bounds are attained by distributions with finite support. This reduces the computational problem to finite-dimensional optimization, and for indices admitting linear-fractional representations after suitable ordering of the data (including the Gini coefficient and quantile ratios), we express the bound problems as linear or quadratic programs. We then establish $\sqrt{n}$ inference for the upper and lower bounds using a directional delta method and bootstrap confidence intervals. In applications, we compute sharp Gini bounds from household wealth data with mixed point and interval observations and use historical U.S. grouped income tables to bound time series for the Gini and quantile ratios. 2025-12-08T16:55:38Z James Banks Thomas Glinnan Tatiana Komarova http://arxiv.org/abs/2603.16035v1 Identification Verification for Structural Vector Autoregressions with Sparse Heterogeneous Markov Switching Heteroskedasticity 2026-03-17T00:41:19Z We propose a structural vector autoregressive model with a new and flexible specification of the volatility process which we call Sparse Heterogeneous Markov-Switching Heteroskedasticity. In this model, the conditional variance of each structural shock changes in time according to its own Markov process. Additionally, it features a sparse representation of Markov processes, in which the number of regimes is set to exceed that of the data-generating process, with some regimes allowed to have zero occurrences throughout the sample. We complement these developments with a definition of a new distribution for normalised conditional variances that facilitates Gibbs sampling and identification verification. In effect, our model: (i) normalises the system and estimates the structural parameters more precisely than popular alternatives; (ii) can be used to verify homoskedasticity reliably and, thus, inform identification through heteroskedasticity; and (iii) features excellent forecasting performance comparable with Stochastic Volatility. Finally, revisiting a prominent macro-financial structural system, we provide evidence for the identification of the US monetary policy shock via heteroskedasticity, with estimates consistent with those reported in the literature. 2026-03-17T00:41:19Z Keywords: Identification Through Heteroskedasticity, Heterogeneous Markov Switching, Sparse Markov Process, Identification Verification Fei Shang Guangdong University of Foreign Studies Tomasz Woźniak University of Melbourne http://arxiv.org/abs/2501.18746v3 Model-Adaptive Approach to Dynamic Discrete Choice Models with Large State Spaces 2026-03-16T22:47:02Z Estimation and counterfactual experiments in dynamic discrete choice models with large state spaces pose computational difficulties. This paper proposes a model-adaptive approach, based on the conjugate gradient (CG) method, to solve the linear system of fixed point equations of the policy valuation operator. We propose a model-adaptive sieve space, constructed by iteratively augmenting the space with the residual from the previous iteration. We show both theoretically and numerically that model-adaptive sieves dramatically improve performance. In particular, the approximation error decays at a superlinear rate in the sieve dimension, unlike a linear rate achieved using successive approximation. Our method works for both conditional choice probability estimators and full-solution estimators with policy iteration or Newton-Kantorovich iterations. We apply the method to analyze consumer demand for laundry detergent using Kantar's Worldpanel Take Home data. On average, our method is 80% faster than successive approximation and the exact equation solver in solving the dynamic programming problem, substantially reducing the computational cost of the Bayesian MCMC estimator. 2025-01-30T20:50:40Z Ertian Chen http://arxiv.org/abs/2210.10024v2 Linear Regression with Centrality Measures 2026-03-16T22:07:58Z This paper studies the properties of linear regression on centrality measures when network data is sparse and observed with error. We make three contributions in this setting. First, we show that OLS estimators can become inconsistent under sparsity and characterize the threshold at which this occurs, finding that regression on eigenvector centrality is less robust to sparsity than on degree and diffusion. Second, we derive the asymptotic distributions of the OLS estimators in regimes where they remain consistent. We show that when the target coefficients are non-zero, the estimators exhibit asymptotic bias that can be large relative to their variance, rendering conventional confidence intervals and t-tests invalid. Third, we propose bias correction and inference procedures for OLS with sparse, noisy networks. Simulations confirm that our methods perform well in such settings. We demonstrate the empirical relevance of our results in a stylized study of the relationship between consumption smoothing and informal insurance in Nyakatoke, Tanzania. 2022-10-18T17:47:43Z Yong Cai http://arxiv.org/abs/2507.21790v2 Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities 2026-03-16T20:20:59Z Large Language Models (LLMs) are becoming widely used to support various workflows across different disciplines, yet their potential in discrete choice modelling remains relatively unexplored. This work examines the potential of LLMs as assistive agents in the specification and, where technically feasible, estimation of Multinomial Logit models. We implement a systematic experimental framework involving twelve versions of seven leading LLMs (ChatGPT, Claude, DeepSeek, Gemini, Gemma, Llama, and Mistral) evaluated under five experimental configurations. These configurations vary along three dimensions: (i) modelling goal (suggesting vs. suggesting and estimating MNL models); (ii) prompting strategy (Zero-Shot vs. Chain-of-Thoughts (CoT)); and (iii) information availability (full dataset vs. data dictionary summarising variable names and types). Each specification suggested by the LLMs is implemented, estimated, and evaluated based on goodness-of-fit metrics, behavioural plausibility, and model complexity. Our findings reveal that proprietary LLMs can generate valid and behaviourally sound utility specifications, particularly when guided by structured prompts (CoT). Open-weight models such as Llama and Gemma struggled to produce meaningful specifications. Notably, some LLMs performed better when provided with just data dictionary, suggesting that limiting raw data access may enhance internal reasoning capabilities. Among all LLMs, GPT o3, operating in an agentic setting, was uniquely capable of correctly estimating its own specifications by executing self-generated code. Overall, the results demonstrate both the promise and current limitations of LLMs as assistive agents in discrete choice modelling, not only for model specification but also for supporting modelling decision and estimation, and provide practical guidance for integrating these tools into choice modellers' workflows. 2025-07-29T13:24:44Z 35 pages, 8 figures, 14 tables Georges Sfeir Gabriel Nova Stephane Hess Sander van Cranenburgh http://arxiv.org/abs/2008.04229v11 Decision Conflict, Power Logit, and the Deferral Outside Option 2026-03-16T17:24:10Z Decision makers often opt for the deferral outside option when they find it difficult to make an active choice. Contrary to existing logit models with an outside option where the latter is assigned a fixed value exogenously, this paper introduces and studies a class of logit models where that option's value is menu-dependent, may be determined endogenously, and could be interpreted as proxying the varying degree of decision difficulty at different menus. We focus on the *power logit* special class of these models. We show that these explain some observed choice-deferral effects that are caused by hard decisions, including non-monotonic "roller-coaster" choice-overload phenomena that are regulated by the presence or absence of a clearly dominant feasible alternative. We illustrate the usability, novel insights and explanatory gains of the proposed framework for duopolistic modelling and empirical discrete choice analysis. 2020-08-10T16:03:21Z Georgios Gerasimou http://arxiv.org/abs/2603.02357v2 Quantile-based modeling of scale dynamics in financial returns for Value-at-Risk and Expected Shortfall forecasting 2026-03-16T15:28:03Z We introduce a semiparametric approach for forecasting Value-at-Risk (VaR) and Expected Shortfall (ES) by modeling the conditional scale of financial returns, defined as the difference between two specified quantiles, via restricted quantile regression. Focusing on downside risk, VaR is derived from the left-tail quantile of rescaled returns, and ES is approximated by averaging quantiles below the VaR level. The method delivers robust, distribution-free estimates of extreme losses and captures skewness, heavy tails, and leverage effects. Simulation experiments and empirical analysis show that it often outperforms established models, including GARCH and joint VaR-ES conditional-quantile approaches. An application to daily returns on major international stock indices, spanning the COVID-19 period, highlights its effectiveness in capturing risk dynamics. 2026-03-02T19:56:54Z Xiaochun Liu Richard Luger 10.1016/j.ijforecast.2025.12.002 http://arxiv.org/abs/2301.10643v4 Automatic Locally Robust GMM with Machine-Learning-Generated Regressors 2026-03-16T13:29:47Z Machine-learning (ML) methods now routinely generate regressors used in subsequent econometric analyses, for example, estimated propensity scores, control-function residuals, imputed covariates, learned proxies, or low-dimensional embeddings of high-dimensional data. As these ML-generated regressors become ubiquitous, the lack of general inference methods for models that use them has become a critical limitation. Standard plug-in and Double ML procedures ignore how generated regressors enter later stages, leading to large biases and invalid inference. We develop a three-step locally robust GMM framework for inference with ML generated regressors. A key new insight is downstream local robustness: by a functional chain rule, moment functions that are constructed to be orthogonal to the second step eliminate the complicated indirect (conditioning) effects from the ML-generated regressors. We show how to implement this automatically by estimating the associated Riesz representers through cross-fitted auxiliary regressions, allowing for generic non-Donsker ML in both early steps. In leading treatment-effect and counterfactual settings, simulations demonstrate severe bias in existing methods and reductions of 85-95% using our procedures. 2023-01-25T15:26:18Z 76 pages, 5 figures Juan Carlos Escanciano Telmo Pérez-Izquierdo http://arxiv.org/abs/2502.02734v3 Kotlarski's lemma for dyadic models 2026-03-16T11:18:08Z We show how to identify the distributions of the latent components in the two-way dyadic model for bipartite networks $y_{i,\ell}= α_i+η_{\ell}+\varepsilon_{i,\ell}$. This is achieved by a repeated application of the extension of the classical lemma of Kotlarski (1967) in Evdokimov and White (2012). We provide two separate sets of assumptions under which all the latent distributions are identified. Both rely on some of the latent components being identically distributed. 2025-02-04T21:33:32Z Grigory Franguridi Hyungsik Roger Moon http://arxiv.org/abs/2511.02660v2 Spectral analysis of high-dimensional spot volatility matrix with applications 2026-03-16T08:00:00Z In random matrix theory, the spectral distribution of the covariance matrix has been well studied under the large dimensional asymptotic regime when the dimensionality and the sample size tend to infinity at the same rate. However, most existing theories are built upon the assumption of independent and identically distributed samples, which may be violated in practice. For example, the observational data of continuous-time processes at discrete time points, namely, the high-frequency data. In this paper, we extend the classical spectral analysis for the covariance matrix in large dimensional random matrix to the spot volatility matrix by using the high-frequency data. We establish the first-order limiting spectral distribution and obtain a second-order result, that is, the central limit theorem for linear spectral statistics. Moreover, we apply the results to design some feasible tests for the spot volatility matrix, including the identity and sphericity tests. Simulation studies justify the finite sample performance of the test statistics and verify our established theory. 2025-11-04T15:37:39Z Qiang Liu Yiming Liu Zhi Liu Wang Zhou http://arxiv.org/abs/2502.09806v2 Two-Sided Prioritized Ranking: A Coherency-Preserving Design for Marketplace Experiments 2026-03-15T23:10:11Z Online marketplaces frequently run pricing experiments in environments where users choose from a list of items. In these settings, items compete for users' limited attention and demand, creating interference among items within a list: Changing prices for any item can affect the demand for others, biasing estimates from item-level A/B tests. Besides, a key consideration in pricing experiments is preserving platform coherency across prices and item availability. This requirement rules out experimental designs such as user-level A/B tests as they violate platform coherency. We propose Two-Sided Prioritized Ranking (TSPR) to estimate the total average treatment effect of price changes in such settings. TSPR exploits position bias in ranked search results to create variation in treatment exposure without compromising coherency. TSPR randomizes both users and items and reorders ranked lists, prioritizing treated items for one group of users and untreated items for the other. All users see the same items at consistent prices, but differ in exposure to treatment as they pay disproportionate attention across ranks. In semi-synthetic simulations based on Expedia hotel search data, TSPR outperforms baseline coherency-preserving experiment designs by reducing estimation bias and providing sufficient statistical power. 2025-02-13T22:48:09Z New version with revisions and updated title Mahyar Habibi Zahra Khanalizadeh Negar Ziaeian http://arxiv.org/abs/2312.00590v6 Inference on common trends in functional time series 2026-03-15T14:50:53Z We study statistical inference on unit roots and cointegration for time series in a Hilbert space. We develop statistical inference on the number of common stochastic trends embedded in the time series, i.e., the dimension of the nonstationary subspace. We also consider tests of hypotheses on the nonstationary and stationary subspaces themselves. The Hilbert space can be of an arbitrarily large dimension, and our methods remain asymptotically valid even when the time series of interest takes values in a subspace of possibly unknown dimension. This has wide applicability in practice; for example, to cointegrated vector time series that are either high-dimensional or of finite dimension, to high-dimensional factor models that include a finite number of nonstationary factors, to cointegrated curve-valued (or function-valued) time series, and to nonstationary dynamic functional factor models. To illustrate our methods, we include two empirical examples. 2023-12-01T13:55:12Z Morten Ørregaard Nielsen Won-Ki Seo Dakyung Seong http://arxiv.org/abs/2603.02456v2 When Do Habits Matter? The Empirical Content of Dynamic Hedonic Models 2026-03-14T17:30:24Z Hedonic models value goods through their characteristics but are typically interpreted under time-separable preferences. This assumption is restrictive: when some attributes are habit forming, observed prices reflect both contemporaneous utility and continuation values from past consumption. I develop a nonparametric revealed preference framework for dynamic hedonic valuation, deriving necessary and sufficient conditions for rationalisability over characteristics. The framework separates restrictions imposed by the hedonic price system from those imposed by intertemporal choice and provides diagnostics that quantify the severity of violations along each margin. Applied to household scanner data, I show that most failures of static hedonic valuation reflect violations of the hedonic price structure; conditional on satisfying this structure, allowing for habit formation improves behavioural fit. This alters the mapping from prices to willingness-to-pay and the implied welfare interpretation. 2026-03-02T22:57:30Z Josephine Auer