https://arxiv.org/api/Ju+W1q4Ji5KuQR2yN3xvCLY+39k2026-06-10T05:58:43Z168612015http://arxiv.org/abs/2602.02874v1Ten simple rules for teaching data science2026-02-02T22:30:18ZTeaching data science presents unique challenges and opportunities that cannot be fully addressed by simply borrowing pedagogical strategies from its parent disciplines of statistics and computer science. Here, we present ten simple rules for teaching data science, developed and refined by leading educators in the community and successfully applied in our own data science classrooms.2026-02-02T22:30:18ZTiffany A. TimbersMine Çetinkaya-Rundelhttp://arxiv.org/abs/2601.23171v1Revisiting the Lost Submarine Problem: A Decision Theoretic Approach2026-01-30T16:56:34ZThis article includes a discussion of the ``lost submarine problem", following Morey \emph{et al} (2016). As the title of that paper suggests (\emph{The fallacy of placing confidence in confidence intervals}), the example is intended to illustrate the futility of relying on the confidence interval as a formal inference statement. In the view of this author, the misgivings expressed in Morey \emph{et al} (2016) can be resolved using a decision theoretic approach. While it is true that a variety of statistical methods lead to a variety of confidence intervals, once we precisely define their purpose, a single optimal choice emerges. Furthermore, distinct purposes lead to distinct optimal choices. Therefore, that a variety of procedures exist is an advantage rather than a liability.2026-01-30T16:56:34Z2 figures, 11 pagesAnthony Almudevarhttp://arxiv.org/abs/2601.20405v1Position: A Potential Outcomes Perspective on Pearl's Causal Hierarchy2026-01-28T09:09:50ZPearl's causal hierarchy has garnered sustained attention as a foundational lens for formulating and understanding causal questions, and has been extensively discussed within the framework of structural causal models. In this paper, we revisit the hierarchy from a potential outcomes perspective and provide a formal, systematic classification of how various causal estimands are mapped to specific layers. Building on this classification, we summarize key identifiability challenges for estimands at different layers and review general strategies for achieving identification under varying assumptions. Our perspective is both intuitive and theoretically grounded, as higher layers of the hierarchy correspond to progressively richer features of the potential outcomes distribution, which in turn require stronger assumptions for identification. We expect this perspective to help clarify and deepen understanding of various causal estimands, particularly those in the third layer of the causal hierarchy, along with their associated identifiability challenges, identifiability strategies, and application scenarios.2026-01-28T09:09:50ZPeng WuLinbo Wanghttp://arxiv.org/abs/2601.19814v1Abundance and Economic diversity as a descriptor of cities' economic complexity2026-01-27T17:15:54ZIntricate interactions among firms, institutions, and spatial structures shape urban economic systems. In this study, we propose a framework based on three structural dimensions -- abundance, diversity, and longevity (ADL) of economic units -- as proxies of urban economic complexity and resilience. Using a decade of georeferenced firm-level data from Mexico City, we analyze the relationships among ADL variables using regression, spatial correlation, and time-series clustering. Our results reveal nonlinear dynamics across urban space, with powerlaw behavior in central zones and logarithmic saturation in peripheral areas, suggesting differentiated growth regimes. Notably, firm longevity modulates the relationship between abundance and diversity, particularly in periurban transition zones. These spatial patterns point to an emerging polycentric restructuring within a traditionally monocentric metropolis. By integrating economic complexity theory with spatial analysis, our approach provides a scalable method to assess the adaptive capacity of urban economies. This has implications for understanding informality, designing inclusive urban policies, and navigating structural transitions in rapidly urbanizing regions.2026-01-27T17:15:54ZMarco A. Rosas PulidoRoberto MurcioOmar R. VázquezCarlos Gershensonhttp://arxiv.org/abs/2601.15467v1Treatment effect: a critique2026-01-21T21:05:50ZTwo broad positions within statistics define a treatment effect, on the one hand, as a parameter of a statistical model, and on the other, as an appropriate population-level difference in outcomes or counterfactual outcomes under the different treatment regimes. This short expository paper presents some simple but consequential insights on the two formulations, contrasting the answers under the most favourable fictitious idealisation for the counterfactual framework. These observations clarify the relationship between Fisherian model-based inference and modern counterfactual formulations, and emphasise concerns, raised by Cox and others, regarding the suitability of model-free definitions as targets of inference when scientific conclusions are intended to generalise beyond the observed sample. Parts of the paper are necessarily controversial; we follow Cox (1958a) in not putting these forward in any dogmatic spirit.2026-01-21T21:05:50ZPresented at the Nordic-Baltic Biometrics Conference (Oslo, June 2025), and the RSS International Conference (Edinburgh, September 2025)Heather BatteyCharlotte Edgarhttp://arxiv.org/abs/2410.18939v3Adaptive partition Factor Analysis2026-01-21T15:01:53ZFactor Analysis has traditionally been utilized across diverse disciplines to extrapolate latent traits that influence the behavior of multivariate observed variables. Historically, the focus has been on analyzing data from a single study, neglecting the potential study-specific variations present in data from multiple studies. Multi-study factor analysis has emerged as a recent methodological advancement that addresses this gap by distinguishing between latent traits shared across studies and study-specific components arising from artifactual or population-specific sources of variation. In this paper, we extend the current methodologies by introducing novel shrinkage priors for the latent factors, thereby accommodating a broader spectrum of scenarios -- from the absence of study-specific latent factors to models in which factors pertain only to small subgroups nested within or shared between the studies. For the proposed construction we provide conditions for identifiability of factor loadings and guidelines to perform straightforward posterior computation via Gibbs sampling. Through comprehensive simulation studies, we demonstrate that our proposed method exhibits competing performance across a variety of scenarios compared to existing methods, yet providing richer insights. The practical benefits of our approach are further illustrated through applications to bird species co-occurrence data and ovarian cancer gene expression data.2024-10-24T17:25:32Z35 pages, 8 figuresElena BortolatoAntonio Canalehttp://arxiv.org/abs/2511.06934v2Sequential Causal Normal Form Games: Theory, Computation, and Strategic Signaling2026-01-21T13:38:42ZCan classical game-theoretic frameworks be extended to capture the bounded rationality and causal reasoning of AI agents? We investigate this question by extending Causal Normal Form Games (CNFGs) to sequential settings, introducing Sequential Causal Multi-Agent Systems (S-CMAS) that incorporate Pearl's Causal Hierarchy across leader-follower interactions. While theoretically elegant -- we prove PSPACE-completeness, develop equilibrium refinements, and establish connections to signaling theory -- our comprehensive empirical investigation reveals a critical limitation: S-CNE provides zero welfare improvement over classical Stackelberg equilibrium across all tested scenarios. Through 50+ Monte Carlo simulations and hand-crafted synthetic examples, we demonstrate that backward induction with rational best-response eliminates any strategic advantage from causal layer distinctions. We construct a theoretical example illustrating conditions where benefits could emerge ($ε$-rational satisficing followers), though implementation confirms that even relaxed rationality assumptions prove insufficient when good instincts align with optimal play. This negative result provides valuable insight: classical game-theoretic extensions grounded in rational choice are fundamentally incompatible with causal reasoning advantages, motivating new theoretical frameworks beyond standard Nash equilibrium for agentic AI.2025-11-10T10:31:43ZAAAI 2026 Workshop on Foundations of Agentic Systems TheoryDennis Thummhttp://arxiv.org/abs/2511.04361v2Causal Regime Detection in Energy Markets With Augmented Time Series Structural Causal Models2026-01-21T13:29:23ZEnergy markets exhibit complex causal relationships between weather patterns, generation technologies, and price formation, with regime changes occurring continuously rather than at discrete break points. Current approaches model electricity prices without explicit causal interpretation or counterfactual reasoning capabilities. We introduce Augmented Time Series Causal Models (ATSCM) for energy markets, extending counterfactual reasoning frameworks to multivariate temporal data with learned causal structure. Our approach models energy systems through interpretable factors (weather, generation mix, demand patterns), rich grid dynamics, and observable market variables. We integrate neural causal discovery to learn time-varying causal graphs without requiring ground truth DAGs. Applied to real-world electricity price data, ATSCM enables novel counterfactual queries such as "What would prices be under different renewable generation scenarios?".2025-11-06T13:45:15ZEurIPS 2025 Workshop Causality for Impact: Practical challenges for real-world applications of causal methodsDennis Thummhttp://arxiv.org/abs/2511.04469v4Towards Causal Market Simulators2026-01-21T13:14:57ZMarket generators using deep generative models have shown promise for synthetic financial data generation, but existing approaches lack causal reasoning capabilities essential for counterfactual analysis and risk assessment. We propose a Time-series Neural Causal Model VAE (TNCM-VAE) that combines variational autoencoders with structural causal models to generate counterfactual financial time series while preserving both temporal dependencies and causal relationships. Our approach enforces causal constraints through directed acyclic graphs in the decoder architecture and employs the causal Wasserstein distance for training. We validate our method on synthetic autoregressive models inspired by the Ornstein-Uhlenbeck process, demonstrating superior performance in counterfactual probability estimation with L1 distances as low as 0.03-0.10 compared to ground truth. The model enables financial stress testing, scenario analysis, and enhanced backtesting by generating plausible counterfactual market trajectories that respect underlying causal mechanisms.2025-11-06T15:44:07ZICAIF 2025 Workshop on Rethinking Financial Time-SeriesDennis ThummLuis Ontaneda Mijareshttp://arxiv.org/abs/2601.12167v1Using Directed Acyclic Graphs to Illustrate Common Biases in Diagnostic Test Accuracy Studies2026-01-17T21:07:41ZBackground: Diagnostic test accuracy (DTA) studies, like etiological studies, are susceptible to various biases including reference standard error bias, partial verification bias, spectrum effect, confounding, and bias from misassumption of conditional independence. While directed acyclic graphs (DAGs) are widely used in etiological research to identify and illustrate bias structures, they have not been systematically applied to DTA studies. Methods: We developed DAGs to illustrate the causal structures underlying common biases in DTA studies. For each bias, we present the corresponding DAG structure and demonstrate the parallel with equivalent biases in etiological studies. We use real-world examples to illustrate each bias mechanism. Results: We demonstrate that five major biases in DTA studies can be represented using DAGs with clear structural parallels to etiological studies: reference standard error bias corresponds to exposure misclassification, misassumption of conditional independence creates spurious correlations similar to unmeasured confounding, spectrum effect parallels effect modification, confounding operates through backdoor paths in both settings, and partial verification bias mirrors selection bias. These DAG representations reveal the causal mechanisms underlying each bias and suggest appropriate correction strategies. Conclusions: DAGs provide a valuable framework for understanding bias structures in DTA studies and should complement existing quality assessment tools like STARD and QUADAS-2. We recommend incorporating DAGs during study design to prospectively identify potential biases and during reporting to enhance transparency. DAG construction requires interdisciplinary collaboration and sensitivity analyses under alternative causal structures.2026-01-17T21:07:41ZYang LuNandini Dendukurihttp://arxiv.org/abs/2505.00356v3On the retraining frequency of global models in retail demand forecasting2026-01-14T09:57:38ZIn an era of increasing computational capabilities and growing environmental consciousness, organizations face a critical challenge in balancing the accuracy of forecasting models with computational efficiency and sustainability. Global forecasting models, lowering the computational time, have gained significant attention over the years. However, the common practice of retraining these models with new observations raises important questions about the costs of forecasting. Using ten different machine learning and deep learning models, we analyzed various retraining scenarios, ranging from continuous updates to no retraining at all, across two large retail demand datasets. We showed that less frequent retraining strategies maintain the forecast accuracy while reducing the computational costs, providing a more sustainable approach to large-scale forecasting. We also found that machine learning models are a marginally better choice to reduce the costs of forecasting when coupled with less frequent model retraining strategies as the frequency of the data increases. Our findings challenge the conventional belief that frequent retraining is essential for maintaining forecasting accuracy. Instead, periodic retraining offers a good balance between predictive performance and efficiency, both in the case of point and probabilistic forecasting. These insights provide actionable guidelines for organizations seeking to optimize forecasting pipelines while reducing costs and energy consumption.2025-05-01T07:00:29ZMachine Learning with Applications, 22 , 100769 (2025)Marco Zanotti10.1016/j.mlwa.2025.100769http://arxiv.org/abs/2601.08167v2Proactive Anomaly Screen for Multiple Endpoints Using Bayesian Latent Class Modeling: A k-Step Ahead Approach2026-01-14T03:57:47ZIn clinical trials, ensuring the quality and validity of data for downstream analysis and results is paramount, thus necessitating thorough data monitoring. This typically involves employing edit checks and manual queries during data collection. Edit checks consist of straightforward schemes programmed into relational databases, though they lack the capacity to assess data intelligently. In contrast, manual queries are initiated by data managers who manually scrutinize the collected data, identifying discrepancies needing clarification or correction. Manual queries pose significant challenges, particularly when dealing with large-scale data in late-phase clinical trials. Moreover, they are reactive rather than predictive, meaning they address issues after they arise rather than preemptively preventing errors. In this paper, we propose a joint model for multiple endpoints, focusing on primary and key secondary measures, using a Bayesian latent class approach. This model incorporates adjustments for risk monitoring factors, enabling proactive, $k$-step ahead, detection of conflicting or anomalous patterns within the data. Furthermore, we develop individualized dynamic predictions at consecutive time-points to identify potential anomalous values based on observed data. This analysis can be integrated into electronic data capture systems to provide objective alerts to stakeholders. We present simulation results and demonstrate effectiveness of this approach with real-world data.2026-01-13T02:57:11ZYuxi ZhaoMargaret Gamalohttp://arxiv.org/abs/2601.07534v1Bayesian Handwriting Evidence Evaluation using MANOVA via Fourier-Based Extracted Features2026-01-12T13:35:47ZThis paper proposes a novel statistical approach that aims at the identification of valid and useful patterns in handwriting examination via Bayesian modeling. Starting from a sample of characters selected among 13 French native writers, an accurate loop reconstruction can be achieved through Fourier analysis. The contour shape of handwritten characters can be described by the first four pairs of Fourier coefficients and by the surface size. Six Bayesian models are considered for such handwritten features. These models arise from two likelihood structures: (a) a multivariate Normal model, and (b) a MANOVA model that accounts for character-level variability. For each likelihood, three different prior formulations are examined, resulting in distinct Bayesian models: (i) a conjugate Normal-Inverse-Wishart prior, (ii) a hierarchical Normal-Inverse-Wishart prior, and (iii) a Normal-LogNormal-LKJ prior specification. The hierarchical prior formulations are of primary interest because they can incorporate the between-writers variability, a distinguishing element that sets writers apart. These approaches do not allow calculation of the marginal likelihood in a closed-form expression. Therefore, bridge sampling is used to estimate it. The Bayes factor is estimated to compare the performance of the proposed models and to evaluate their efficiency for discriminating purposes. Bayesian MANOVA with Normal-LogNormal-LKJ prior showed an overall better performance, in terms of discriminatory capacity and model fitting. Finally, a sensitivity analysis for the elicitation of the prior distribution parameters is performed.2026-01-12T13:35:47ZLampis TzaiIoannis NtzoufrasSilvia Bozzahttp://arxiv.org/abs/2405.01342v2Data-Driven Strategies for Detecting and Sampling Misrepresented Subgroups2026-01-11T19:15:11ZEconomic policy research frequently examines population well-being, with a particular focus on the relationships between unequal living conditions, low educational attainment, and social exclusion. Sample surveys, such as EU-SILC, are widely used for this purpose and inform public policy; yet, their sampling designs may fail to adequately represent rare, hard-to-sample, or under-covered subgroups. This limitation can hinder socio-demographic analyses and evidence-based policy design. We propose a generalisable approach based on univariate and multivariate unsupervised learning techniques to detect outliers in survey data that may signal under-represented subgroups. Identified groups can then be characterised to inform targeted resampling strategies that improve survey inclusiveness. An empirical application using the 2019 EU-SILC data for the Italian region of Liguria shows that citizenship, material deprivation, large household size, and economic vulnerability are key indicators of under-representation.2024-05-02T14:45:37ZG. LanciaF. MecattiE. Riccomagnohttp://arxiv.org/abs/2601.05490v1How Carbon Border Adjustment Mechanism is Energizing the EU Carbon Market and Industrial Transformation2026-01-09T02:45:52ZThe global carbon market is fragmented and characterized by limited pricing transparency and empirical evidence, creating challenges for investors and policymakers in identifying carbon management opportunities. The European Union is among several regions that have implemented emissions pricing through an Emissions Trading System (EU ETS). While the EU ETS has contributed to emissions reductions, it has also raised concerns related to international competitiveness and carbon leakage, particularly given the strong integration of EU industries into global value chains. To address these challenges, the European Commission proposed the Carbon Border Adjustment Mechanism (CBAM) in 2021. CBAM is designed to operate alongside the EU ETS by applying a carbon price to selected imported goods, thereby aligning carbon costs between domestic and foreign producers. It will gradually replace existing carbon leakage mitigation measures, including the allocation of free allowances under the EU ETS. The initial scope of CBAM covers electricity, cement, fertilizer, aluminium, iron, and steel. As climate policies intensify under the Paris Agreement, CBAM-like mechanisms are expected to play an increasingly important role in managing carbon-related trade risks and supporting the transition to net zero emissions.2026-01-09T02:45:52Z17 Pages; 4 FiguresJoseph NyangonBrecht Seifi