https://arxiv.org/api/tuC1geVynZWo35od2xjICZGCOLM2026-06-18T16:00:43Z2357136015http://arxiv.org/abs/2605.24356v1Contested Temporalities in Critical Minerals and Resource Extraction for Electric Vehicles2026-05-23T02:35:08ZThe global push for electric vehicles (EVs) has sharply increased demand for critical minerals such as cobalt and lithium, creating a tension between rapid industrial growth and long-term sustainability. Extraction is concentrated in a few regions -- notably the Democratic Republic of Congo (DRC), Chile, and Argentina -- where it has produced serious socio-environmental harms, including ecosystem degradation, labour exploitation, and the displacement of Indigenous communities. In the DRC, cobalt mining is frequently linked to child labour and hazardous working conditions; in Chile, lithium extraction intensifies water scarcity and threatens local agriculture and biodiversity. Policy instruments such as the U.S. Inflation Reduction Act (IRA) seek to promote ethical sourcing, but an extraction-driven model continues to deepen global inequalities. This chapter examines the contested temporalities of the transition, in which the short-term economic incentives of extraction conflict with longer-term environmental and social goals. It argues for a place-based framework built on community-centred governance, sustainable mining practices, and circular-economy strategies, including recycling and material substitution, to align resource security with equity and ensure that the shift to EVs does not reproduce the injustices it aims to address.2026-05-23T02:35:08Z31 Pages, 2 FiguresJoseph Nyangonhttp://arxiv.org/abs/2602.16376v2Two-way Clustering Robust Variance Estimator in Quantile Regression Models2026-05-23T00:30:33ZWe study inference for linear quantile regression with two-way clustered data. Using a separately exchangeable array framework and a projection decomposition of the quantile score, we characterize regime-dependent convergence rates and establish a self-normalized Gaussian approximation. We propose a two-way cluster-robust sandwich variance estimator with a kernel-based density ``bread'' and a projection-matched ``meat'', and prove consistency and validity of inference in Gaussian regimes. We also show an impossibility result for uniform inference in a non-Gaussian interaction regime.2026-02-18T11:35:18ZUlrich HounyoJiahao Linhttp://arxiv.org/abs/2605.24284v1Scalable Gaussian Process for Learning Non-Ergodic Ground Motion Model from Physics-Based Simulations with Application to Power Infrastructure Assessment2026-05-22T23:33:19ZThis study presents the development and application of a scalable non-ergodic ground motion model (NGMM) for the Los Angeles area. The NGMM is trained and validated on physics-based simulated ground-motion data from a recent Statewide California Earthquake Center (SCEC) CyberShake study. The NGMM is formulated as a Gaussian Process (GP) regression model, where the prior median is defined as the ASK14 ergodic ground-motion model and the posterior median is obtained by learning the non-ergodic effects embedded in the training data. These non-ergodic effects include systematic site and path effects, which are represented in the GP using Matérn and specialized covariance kernels that explicitly characterize path vectors. Implementing the NGMM requires hyperparameter tuning and inference on large datasets (on the order of one million data points or more), posing significant computational challenges for conventional GP approaches. To address this scalability issue, this paper presents a suite of computational strategies, including sparse Cholesky inversion, parallel computing, GPU acceleration, and stochastic gradient descent minimization. Despite these advances, the full CyberShake dataset (on the order of hundreds of millions of data points) remains computationally prohibitive. Therefore, aleatory variability is modeled separately using a mixed-effects formulation to represent within-event and between-event variability. The developed NGMM has two primary applications: interpolation of partially observed ground-motion fields and predictive modeling for ground motions in unobserved earthquake scenarios. Validation results on independent datasets demonstrate accurate performance in both applications. A case study of power transmission network assessment in an Mw 6.7 Puente Hill scenario further demonstrated that the developed NGMM closely reproduces physics-based simulation results.2026-05-22T23:33:19ZJinyan ZhaoGrigorios LavrentiadisDomniki Asimakihttp://arxiv.org/abs/2605.24212v1Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction2026-05-22T20:53:04ZDeploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for out-of-hospital cardiac arrest (OHCA) rely on detailed prehospital measurements routinely collected in high-resource settings but unavailable in many international registries. Existing methods either discard missing covariates, sacrificing predictive information, or rely on untestable assumptions about their target distribution. We propose DRUM (\underline{D}istributionally \underline{R}obust \underline{U}nsupervised transfer learning with structurally \underline{M}issing covariates), a framework that transfers prediction models to target populations where certain covariates are structurally absent and outcome labels are unavailable. DRUM partitions covariates into shared components ($X$), observed across all settings, and missing components ($A$), observed only in the source. Rather than imputing missing covariates, DRUM optimizes worst-case predictive performance over the unknown target distribution of $A \mid X$ using a neural network generator, with a robustness parameter controlling allowable deviation from the source conditional. We further develop a bias correction procedure that reduces sensitivity to nuisance estimation error. Simulations show substantial improvements in both mean and worst-case prediction error under distribution shift. Applied to cross-national OHCA prediction, transferring models from a US registry to multiple Asian registries where prehospital variables are unrecorded, DRUM yields better-calibrated predictions and improved clinical classification performance across sites.2026-05-22T20:53:04ZSiqi LiChuan HongZiye TianBenjamin Sieu-Hon LeongKoshi NakagawaHideharu TanakaSang Do ShinKhuong Quoc DaiDo Ngoc SonMarcus Eng Hock OngNan LiuMolei Liuhttp://arxiv.org/abs/2302.03089v2Statistical methods for partitioning ribbon and globally-distributed flux using data from the Interstellar Boundary Explorer2026-05-22T19:19:28ZNASA's Interstellar Boundary Explorer (IBEX) satellite collects data on energetic neutral atoms (ENAs) that can provide insight into the heliosphere boundary between our solar system and interstellar space. Using these data, scientists can construct maps of the ENA intensities (often, expressed in terms of flux) observed in all directions. The ENA flux observed in these maps is believed to come from at least two distinct sources: one source which manifests as a ribbon of concentrated ENA flux and one source (or possibly several) that results in a smoothly-varying globally-distributed flux. Each ENA source type and its corresponding ENA intensity map is of separate scientific interest.
In this paper, we develop statistical algorithms for separating the total ENA intensity maps into two source-specific maps (ribbon and globally-distributed flux) and estimating corresponding uncertainty. Key advantages of the proposed method include enhanced model flexibility and improved propagation of estimation uncertainty. We evaluate the proposed methods on simulated data designed to mimic realistic data settings. We also propose new methods for estimating the center of the near-elliptical ribbon in the sky, which can be used in the future to study the location and variation of the local interstellar magnetic field.2023-02-06T19:44:14ZLauren J. BeesleyDave OsthusKelly R. MoranMadeline A. StricklinGrant David MeadorsThomas K. KimSung Jun NohNehpreet K. WaliaPaul H. JanzenEric J. ZirnsteinBrian P. WeaverDaniel B. Reisenfeldhttp://arxiv.org/abs/2605.24123v1Heritability: A Counterfactual Perspective2026-05-22T18:33:18ZHeritability is a central concept in the long-standing debate about nature versus nurture in biological and social sciences. However, existing notions of heritability are based on strong assumptions and do not use explicit causal models. We propose a new, counterfactual definition of heritability by adopting the potential outcomes model in causal inference. Our counterfactual heritability measures the importance of genetic inheritance by the average magnitude of difference between an individual with their hypothetical ``non-identical twin'' that is exposed to the exact same environment. We provide bounds on the counterfactual heritability that can, in principle, be computed from observational data. We then compare counterfactual heritability and its associated bounds with common notions of heritability in population-based studies, twin and sibling studies, and plant breeding experiments. Our results and comparisons highlight the importance of clarifying the causal structural assumptions and counterfactual comparisons in reasoning about heritability.2026-05-22T18:33:18Z46 pagesHaochen LeiJieru ShiHongyuan CaoQingyuan Zhaohttp://arxiv.org/abs/2605.20143v2Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates2026-05-22T17:26:44ZIn the era of precision medicine, genome-wide epigenetic modifications offer rich data that could inform risk prediction. However, these data are high-dimensional and exhibit complex dependence structures, which makes it difficult to jointly model them with low-dimensional covariates when the goal is to obtain interpretable effect estimates for covariate adjustment. Standard Bayesian additive regression trees (BART) provide strong predictive performance but treat all predictors uniformly within the tree ensemble, obscuring the contributions of significant covariates and complicating variable selection in high-dimensional settings. We propose a semi-parametric BART model (spBART) that addresses this limitation by modeling low-dimensional covariates through a parametric component with interpretable coefficients, while capturing complex nonlinear associations among high-dimensional predictors through the tree ensemble. To perform stable variable selection, we develop a cross-validation-based procedure that aggregates posterior inclusion probabilities across folds and applies Bayesian false discovery rate control. We apply the proposed method to a pooled case--control analysis of high-dimensional genome-wide 5-hydroxymethylcytosine profiles derived from circulating cell-free DNA in two multiple myeloma studies ($N = 869$). The approach identifies a parsimonious set of candidate loci and achieves strong out-of-sample discrimination (AUC $= 0.96$) in a held-out validation set. Overall, spBART provides a unified framework for combining interpretable covariate inference with flexible modeling and variable selection in high-dimensional biomedical studies.2026-05-19T17:31:20ZSaurabh BhandariParveen BhattiBrian C. -H. ChiuYuan Jihttp://arxiv.org/abs/2605.23858v1Anticipating Continued Global Fertility Decline via Neural Forecasting2026-05-22T17:17:48ZThe accelerating shift toward low and ultra-low fertility has intensified the debate over whether countries now undergoing rapid decline are approaching stabilization or entering a more persistent low-fertility regime. Existing projection systems answer that question differently because they embed different assumptions about recovery and about the role of external drivers. To provide an empirical benchmark in this debate, we introduce NeuralTFR, an endogenous global forecasting framework based on a recurrent neural network. Drawing on a harmonized panel of historical fertility series from 196 countries and territories, the model pools cross-country information to learn demographic momentum and generate empirical prediction intervals via multi-quantile regression. Evaluated on a held-out period (2009--2023), NeuralTFR achieves lower point-forecast errors than a Naive Drift baseline and BayesTFR, the United Nations' Bayesian Hierarchical Model, while maintaining competitive uncertainty calibration. In forward projections to 2040, NeuralTFR points to broader exposure to low and very low fertility than BayesTFR, suggesting weaker support for near-term stabilization while still falling short of the most severe decline paths predicted by the Global Burden of Disease project.2026-05-22T17:17:48ZDaniel CigandaFacundo MoriniFrancisco PirizHenrik-Alexander SchubertUgofilippo BaselliniMikko Myrskylähttp://arxiv.org/abs/2605.23692v1Trajectory-Oriented Optimization Via Adaptive Thompson Sampling And Grid Refinement: A Tutorial With The ADAPTIVE\_TS Package2026-05-22T14:48:08ZStochastic simulators are increasingly used to expand the frontier of scientific knowledge and inform decision-making across real-world contexts. Simulator calibration, a process by which internal model inputs are tuned to match some external criteria, usually in the form of observed data, is a key step in model design and validation. Epidemiological simulators present an especially compelling use case, as evidenced by the recent COVID-19 pandemic. Among several calibration paradigms, trajectory-oriented optimization is an emerging approach that does not require assumptions on the stochastic behavior of the simulator replicates and is particularly effective at identifying trajectories through the lens of errors between the simulator and observed data, especially when combined with Bayesian optimization. We present a tutorial on trajectory-oriented optimization with \texttt{adaptive\_ts}, an open-source Python package. We also provide a series of worked examples on an accompanying webpage.2026-05-22T14:48:08ZDavid O'GaraArindam FadikarMickaël BinoisNicholson CollierJonathan Ozikhttp://arxiv.org/abs/2605.21893v2Sequential Sensitivity Analysis for Multiple Assumptions: A Framework for Understanding Racial Disparity in Police Use of Force2026-05-22T14:22:02ZInferring racial discrimination in police use of force -- the average causal effect of civilian race on use of force -- requires two assumptions about policing prior to potential use of force: that officers do not discriminate in whom they would stop (no discrimination in stops) and that, conditional on patrol context, the probability that an encounter is with a minority rather than a white civilian does not vary across encounters (no bias in encounters). As Knox et al. (2020) show, violations of the first can mask racial disparity in force. Whether it reflects discrimination in force also depends on the second. Existing sensitivity analyses address one assumption at a time. We develop a framework that varies both sequentially and apply it to NYPD Stop, Question, and Frisk data (2003--2013). Under plausible levels of discrimination in stops, we find substantial racial disparity in force. However, the conclusion that this disparity reflects discrimination is fragile to modest departures from no bias in encounters that census-based calibration suggests are demographically feasible. By jointly addressing both confounding channels, the framework reveals how they interact in ways that separate analyses cannot, contributing to understanding what generates racial disparities and how they might be addressed.2026-05-21T02:04:38ZThomas LeavittJake BowersLuke Miratrixhttp://arxiv.org/abs/2605.16606v2Beyond the Composite: Enhancing Trial Analysis through a Divide & Conquer Approach to 'Days Alive and at Home': Insights from the NOTACS trial2026-05-22T14:10:43Z"Days alive and at home" (DAH) is a recent patient-centered outcome measure for perioperative trials, defined as the number of days a patient spends at home during the follow-up period. DAH typically follows a zero-inflated, left-skewed, bi-modal distribution. Other increasingly used complex endpoints, such as days alive without a ventilator, share these statistical features arising from combining survival with another clinically relevant count outcome into a single, comprehensive measure. A key challenge for DAH and similar endpoints is the lack of a readily identifiable distributional form, which complicates the statistical design of trials using it as the primary endpoint, particularly regarding the robustness of sample size calculations and final analyses where the central limit theorem might not be suitable. Using 200 data points from the interim data of the NOTACS trial (ISRCTN14092678), whose primary endpoint was DAH, we developed a novel 'Divide & Conquer' model that breaks DAH into distinct parts modeled individually. To our knowledge, such a model has not been used before for DAH. We demonstrate that our approach significantly improves model fit compared to existing alternatives, enabling more suitable DAH data generation that can be used for simulation-based sample size calculations and evaluation of operating characteristics of the statistical test(s). Beyond NOTACS, our work has large potential to inform the design and analysis of other trials using DAH or similar complex endpoints.2026-05-15T20:14:17Z35 pages, 8 figures, 2 tablesLetao YuanSofía S. VillarDominique-Laurent Couturierhttp://arxiv.org/abs/2109.13785v11Reducing the non-uniformity of the group draw in sports tournaments2026-05-22T12:45:08ZThe group draw of a sports tournament requires assigning teams to groups of (almost) the same size. The most important criteria for a draw procedure are balance, randomness, and transparency, which could not be satisfied simultaneously if draw constraints exist. Organisers usually use the so-called Skip mechanism, a method based on a random sequential draw of the teams from pots, in order to ensure balance and transparency. However, the Skip mechanism is non-uniformly distributed: the valid assignments are not necessarily equally likely. We quantify this distortion if a group can contain at most two teams from a given set S, which poses a serious challenge for the Skip mechanism. Our study provides exact results for an arbitrary number of teams when there are three pots and two pots contain only one team from the set S, as well as complete enumeration for small problems with three pots and at most five teams per pot. We also analyse three real-world case studies from basketball and football. It turns out that the optimal design considers the pots in decreasing order according to the number of teams in the set S. These results can be used to identify the least distorted transparent draw procedure, and decide whether the extent of non-uniformity calls for further actions.2021-09-28T15:06:25Z30 pages, 4 figures, 9 tablesApplied Soft Computing, 201(A): 115535, 2026László Csató10.1016/j.asoc.2026.115535http://arxiv.org/abs/2311.06139v3Joint Object Tracking and Intent Recognition2026-05-22T11:37:04ZThis paper presents a Bayesian framework for inferring the posterior of the augmented state of a target, incorporating its underlying goal or intent, such as any intermediate waypoints and/or the final destination. Thus, it is for joint object tracking and intent recognition. Several latent intent models are proposed here within a virtual leader formulation. They capture the influence of the target's hidden goal on its instantaneous behaviour. In this context, various motion models, including for highly maneuvering objects, are also considered. The a priori unknown target intent (e.g. destination) can dynamically change over time and take any value within the state space (e.g. a location or spatial region). A sequential Monte Carlo (particle filtering) approach is introduced for the simultaneous estimation of the target's (kinematic) state and its intent. Rao-Blackwellisation is employed to enhance the statistical performance of the inference routine. Simulated data and real radar measurements are used to demonstrate the efficacy of the proposed techniques.2023-11-10T15:56:52ZSubmitted to IEEE Transactions on Aerospace and Electronic Systems (T-AES)Jiaming LiangBashar I. AhmadSimon Godsillhttp://arxiv.org/abs/2509.03675v2Latent space projections and atlases: A cautionary tale in deep neuroimaging using autoencoders2026-05-22T07:41:10ZThis study introduces a deep learning framework for the inferential exploration of latent representations in 3D brain MRI, leveraging a simple convolutional autoencoder with a hierarchical encoder and a compact latent space. Trained on segmented gray matter images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the model learns latent representations that preserve neuroanatomical structure and reflect clinical variability across cognitive status. Dimensionality reduction techniques (PCA, t-SNE, PLS, UMAP) were applied to visualize and interpret the latent space, correlating it with anatomical regions defined by the AAL atlas. As a novel contribution, the Latent-Regional Correlation Profiling (LRCP) framework, which combines statistical association and supervised discriminability to identify brain regions that encode clinically relevant latent information is proposed. Our results show that even minimal architectures capture meaningful patterns associated with progression to Alzheimer's disease. Interpretability is assessed by applying SHAP-based regression to a post-hoc model that predicts reconstruction error from atlas-based regional gray matter intensities, thereby identifying anatomically meaningful regions involved in class-specific reconstruction strategies. These findings are further validated using statistical agnostic methods, highlighting the importance of rigorous evaluation in neuroimaging. This work demonstrates the potential of autoencoders as exploratory tools for biomarker discovery and hypothesis generation in clinical neuroscience.2025-09-03T19:47:24Z36 pages , 24 figuresJ. M. GorrizF. SegoviaC. JimenezJ. E. ArcoF. J. MartinezJ RamirezS. AbulikemuJ. Sucklinghttp://arxiv.org/abs/2605.23246v1Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies2026-05-22T05:30:06ZApplications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have provided increasingly clear perspectives on the use of AI in regulated applications, including recent draft guidance from FDA that provides a 7-step risk-based framework to assess AI model credibility for these cases. We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development. We demonstrate the application with an example in Alzheimerś Disease.2026-05-22T05:30:06Z22 pages, 3 figuresAaron M. SmithTala FakhouriRun ZhuangJonathan R. Walsh