Subjective Time Deformation in Intertemporal Choice: A Functional Data Analysis Approach

2026-05-29T12:59:42Z

Intertemporal choice data are usually summarized through scalar discount-rate parameters or fitted by predetermined parametric discount functions, although relevant information may lie in the shape of the whole discounting trajectory. This paper proposes a Functional Data Analysis framework for reconstructing and analyzing implicit subjective-time trajectories from discrete intertemporal equivalence judgments. Monetary equivalence responses from a multilingual questionnaire are transformed into individual discount curves, regularized by monotone smoothing, and used to recover normalized implicit subjective-time trajectories. The trajectories are examined through derivative summaries, Functional Principal Component Analysis, and clustering on standardized component scores. The empirical application, based on 107 participants, shows that heterogeneity in intertemporal choice is not fully captured by scalar discount-rate variation. The first two functional principal components explain 97.44% of the variability, indicating a low-dimensional structure. Functional clustering identifies three stable profiles of temporal deformation, supported by bootstrap stability analysis and sensitivity checks on components, algorithms, distances, smoothing specifications, and outlier treatment. Parametric benchmarks based on exponential, Weber-Fechner, and Stevens specifications provide accurate fits for many individuals, but do not fully recover the functional clustering structure. The comparison with explicit subjective-time perception measures reveals only partial alignment between implicit trajectories reconstructed from choices and directly reported temporal perception. Functional Data Analysis provides an applied statistical framework for representing intertemporal choice heterogeneity as variation in functional shape, complementing scalar discount-rate and parametric subjective-time models.

A Kernel Score Perspective on Forecast Disagreement and the Linear Pool

2026-05-29T10:45:24Z

This paper generalizes several results on linear pooling from squared error loss to all kernel scores. The latter are a rich family of scoring rules that covers point and distribution forecasts for univariate and multivariate, discrete and continuous settings. Its members include the Continuous Ranked Probability Score for univariate distribution forecasting and the Energy Score for multivariate distribution forecasting. Our results indicate that forecast disagreement (measured as the average pairwise divergence of all component distributions) has important implications for the linear pool's performance. The results are useful for understanding and designing linear pools in general combination settings. In particular, they motivate using the linear pool (as opposed to other combination formulas) and yield a novel condition under which equal combination weights are optimal under a given kernel scoring rule.

Learning-to-Defer in Non-Stationary Time Series via Switching State-Space Models

2026-05-29T08:03:31Z

Learning-to-defer (L2D) routes each decision to a system's own predictor or to an external expert. Streaming time-series settings break the offline-L2D assumptions: the data are non-stationary, expert availability shifts over time, and the internal predictor is trained online. We propose L2D-SLDS, a one-stage online L2D framework based on a factorized switching linear-Gaussian state-space model over all potential residuals: a discrete regime, a shared global factor, and per-expert idiosyncratic states. The always-observed internal residual continuously updates beliefs about every unqueried expert through the shared factor, and a learner-aware query score balances immediate cost against latent-state information gain and one-step learner improvement. We prove an oracle inequality against a time-varying learn-and-defer comparator, decomposing regret into a query-bonus budget, an SLDS predictive-cost-error term~$\mathcal{E}_{\mathrm{SLDS}}$, and the internal learner's interval dynamic regret. On synthetic, Melbourne, Jena, and 24-expert Delhi benchmarks, L2D-SLDS is competitive with or improves on contextual- and non-stationary-bandit baselines while deferring on ${<}2\%$ of real-data rounds.

Coordination without communication: beyond optimisation and geometric Brownian motion

2026-05-29T07:35:55Z

We introduce a physically grounded framework for coordination in a population based on information constrained feedback in a partially observed stochastic dynamical system. Population size evolves as a continuous time birth death Markov process whose transition rates respond to a shared stochastic measurement signal correlated with the underlying population state. Individuals neither communicate directly nor optimise strategies; instead, coordination emerges from macro to micro feedback mediated by imperfect common information. We show that geometric Brownian motion arises as a limiting case of the conditional dynamics when measurement strength and population statistics satisfy suitable conditions. More generally, varying the signal to noise properties of the measurement channel produces a wider class of stochastic growth processes, including diffusive and jump like regimes, even though ensemble average growth remains exponential. In an appropriate limit the framework recovers the stochastic multiplicative growth model of Peters and Adamou, providing a physical interpretation of coordination as inference and feedback under partial observability.

Bayesian Classification with Probit-link Split-and-merge Gaussian Process Prior in EEG-based Brain-Computer Interfaces

2026-05-29T03:05:06Z

A Brain-Computer Interface (BCI) speller systems based on Event-Related Potentials (ERPs) enables users to select characters by detecting brain responses to visual stimuli, recorded through electroencephalogram (EEG). One challenge is to accurately identify target-related responses, such as the P300 component. However, existing methods tend to ignore feature selection, perform feature selection without interpretability, or require large computational effort or data manipulation. To address these limitations, we propose a novel Bayesian generative modeling framework to the binary classification of EEG responses to stimuli. Our approach employs a Probit-link Split-and-merge Gaussian Process (P-SMGP) prior to perform spatial-temporal feature selection, effectively capturing the distinctions between target and non-target ERP responses. Through both simulation studies and real EEG data analysis, our approach can reduce computational complexity and provide statistical interpretations on transformed ERP functions while maintaining comparable prediction accuracy. These findings underscore the value of interpretable, stimulus-level modeling for advancing predictive and personalized BCI systems.

Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors

2026-05-28T23:25:19Z

Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-fidelity spatial priors. We show that diffusion models better preserve key rainfall statistics compared to censored Gaussian processes. Framing rainfall estimation as a Bayesian inverse problem with a DM prior enables training-free posterior sampling using a broad family of methods, including Plug-and-Play, Sequential Monte Carlo, and Replica Exchange methods. Experiments on synthetic and real-world datasets demonstrate consistent improvements over established CML-based reconstruction baselines.

A Bayesian Framework for Uncertainty-Aware Estimation of Main Pulmonary Artery Velocity Profiles from Phase-Contrast MRI

2026-05-28T22:18:39Z

Computational cardiovascular flow models are highly sensitive to prescribed inlet velocity profiles. While imaging-derived velocity fields provide physiologically realistic information, they can introduce increased preprocessing complexity, imaging noise, and computational burden. Simplified analytical formulations are computationally efficient but may not fully capture subject-specific flow characteristics. In this study, we present an uncertainty-aware framework that combines two-dimensional phase-contrast magnetic resonance imaging (2D PC-MRI) with mechanistic velocity-profile formulations to generate subject-specific pulmonary artery velocity representations. Imaging-derived radial velocity distributions were constructed from main pulmonary artery (MPA) PC-MRI data in canine and swine subjects using elliptical radial binning and normalization. Power-law and Womersley velocity-profile formulations were fitted within a Bayesian inference framework while accounting for uncertainty associated with imaging measurements and model representation. The two formulations were compared using regional and global weighted root mean square error (wRMSE) metrics. Both models demonstrated close agreement with the imaging-derived velocity profiles across subjects. Although the Womersley formulation provided greater flexibility near the vessel wall, it did not result in statistically significant improvements in fitting performance compared with the simpler power-law model. The proposed framework provides low-dimensional, physiologically interpretable, and uncertainty-aware velocity-profile representations that may serve as computationally efficient alternatives for subject-specific cardiovascular flow modeling.

Rectified Linear Unit Regression

2026-05-28T22:02:55Z

This paper develops a regression framework for the direct estimation of integrated functionals of conditional outcome distributions. The proposed method, termed rectified linear unit (ReLU) regression, projects the ReLU-transformed outcome onto covariates and admits a closed-form estimator. Its population regression function coincides with the integrated conditional distribution function of the outcome, and its convex conjugate, obtained via the Legendre-Fenchel transformation, recovers the integrated conditional quantile function. Both the regression and its conjugate require only mild distributional assumptions and accommodate non-continuous outcomes. We establish the uniform asymptotic distribution of the estimator and develop inference for the conjugate functional via the delta method for Hadamard directionally differentiable maps. Building on these results, we establish identification and inference for average quantile treatment effects over arbitrary subintervals of probability levels. This broadens the set of distributional parameters available to empirical work.

Orthogonalized Kernel Regression for Spatial and Spatio-Temporal Residual Risk: Application to School Shootings in the Contiguous United States

2026-05-28T21:59:53Z

Distinguishing background heterogeneity from excess risk is a central challenge in case-control event data when both covariates and residual spatial or spatio-temporal structure matter. We develop a covariate-adjusted kernel regression framework that embeds an orthogonalized residual risk surface within a semiparametric binary model, and extend the approach from purely spatial to explicit spatio-temporal analysis. We apply the method to 959 gun violence incidents at public schools in the contiguous United States from 2000 to 2024, using incidents from the K-12 School Shooting Database linked to official school records for the corresponding year. The fitted models identify stable school-level associations, including markedly higher risk for larger schools and for middle and high schools, while also revealing substantial residual structure beyond the background distribution of schools. After adjustment for covariates, excess risk is found to remain concentrated in a persistent central-eastern corridor of the United States, with the strongest evidence appearing in recent years. More broadly, the analysis shows how residual risk surfaces can sharpen inference by separating background heterogeneity from anomalous structure in case-control event processes evolving over space and time.

Exploring the periodicity of flight patterns

2026-05-28T19:25:40Z

Each year the American Statistical Association (ASA) hosts the Annual Data Challenge Expo, which tasks participants with analyzing a given dataset and presenting their work at the Joint Statistical Meeting (JSM). The 2025 Data Challenge Expo tasked participants with analyzing over 35 years of commercial flight data from the United States Bureau of Transportation Statistics (BTS). These data provide extensive geographic coverage and operational details for the U.S. domestic aviation market. For millions of past flights, there is information about the flight's date, origin, destination, carrier, plane, departure, and arrival. In this article, we present our analysis for the 2025 JSM Data Challenge Expo. We chose to explore patterns in the daily scheduling of departures and arrivals across airlines, airports, and time. In doing so, we observed distinct scheduling ``waves'', or periodic structures at major airline hubs as well as large Federal Aviation Administration (FAA) hubs. In the remainder of this article, we detail the process of visualizing periodicity in flight scheduling as well as quantifying it through the calculation of Shannon entropy. An additional element to the 2025 Data Challenge Expo is the incorporation of a second dataset, to be decided by the participants. We detail the use of a BTS dataset with passenger enplanement (boarding) information to determine Federal Aviation Administration (FAA) hub classification (as opposed to airline-specific hubs). Furthermore, we discuss results from this visual and quantitative analysis, highlighting noticeable differences in the scheduling periodicity and entropy across airports, for the ``big four'' or four largest carriers, in U.S. aviation: American Airlines, Delta Air Lines, United Airlines, and Southwest Airlines.

Multidimensional Item Response Theory under General Latent Distributions

2026-05-28T18:41:56Z

Multidimensional item response theory (MIRT) provides an important psychometric framework for modeling how multiple latent traits jointly influence observed item responses. In most existing estimation procedures, the latent trait distribution is assumed to be Gaussian. Although computationally convenient, this assumption can be restrictive in many applications where the latent distribution exhibits skewness, heavy tails, or multimodality. More importantly, misspecifying the latent distribution may bias the estimation of item parameters and latent traits. To address this limitation, we propose a data-driven flow-based framework for MIRT models that can capture a broad class of non-Gaussian latent distributions. The proposed approach represents the latent distribution as an invertible transformation of a simple base distribution. For efficient estimation, we further introduce a conditional flow as a function of both the observed response and the noise to approximate the posterior distribution. Under this framework, the item parameters, latent distribution, and posterior approximation can be learned jointly. Comprehensive simulation studies show that the proposed method improves item-parameter and latent-trait recovery when the true latent distribution is non-normal. An application to a personality dataset further illustrates the practical utility of the proposed framework for modeling complex latent trait distributions in large-scale data.

A Causal Framework for Estimating Heterogeneous Effects of On-Demand Tutoring

2026-05-28T18:03:51Z

This paper introduces a scalable causal inference framework for estimating the immediate, session-level effects of on-demand human tutoring embedded within adaptive learning systems. Because students seek assistance at moments of difficulty, conventional evaluation is confounded by self-selection and time-varying knowledge states. We address these challenges by integrating principled analytic sample construction with Deep Knowledge Tracing (DKT) to estimate latent mastery, followed by doubly robust estimation using Causal Forests. Applying this framework to over 5,000 middle-school mathematics tutoring sessions, we find that requesting human tutoring increases next-problem correctness by approximately 4 percentage points and accuracy on the subsequent skill encountered by approximately 3 percentage points, suggesting that the effects of tutoring have proximal transfer across knowledge components. This effect is robust to various forms of model specification and potential unmeasured confounders. Notably, these effects exhibit significant heterogeneity across sessions and students, with session-level effect estimates ranging from $-20.25pp$ to $+19.91pp$. Our follow-up analyses suggest that typical behavioral indicators, such as student talk time, do not consistently correlate with high-impact sessions. Furthermore, treatment effects are larger for students with lower prior mastery and slightly smaller for low-SES students. This framework offers a rigorous, practical template for the evaluation and continuous improvement of on-demand human tutoring, with direct applications for emerging AI tutoring systems.

Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets

2026-05-28T17:40:42Z

Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches either target predictive modeling over individual datasets, which requires a shared set of variable definitions, or lack mechanisms for interpretable cross-dataset alignment. The proposed methodology characterizes numeric tabular datasets through structured exploratory data analysis descriptors, embeds those descriptors into a shared vector space using a pretrained sentence transformer, and quantifies cross-dataset similarity via Canonical Correlation Analysis (CCA). Furthermore, a penalized formulation of CCA is applied to recover sparse, interpretable variable-level correspondences between datasets, identifying which statistical descriptors or variable-level quantities drive cross-dataset alignment without requiring shared variable names or feature conventions. Differential privacy is optionally applied to the descriptor set prior to embedding, supporting deployment in sensitive data contexts without requiring access to raw observations at time of comparison. The methodology is evaluated across 15 datasets spanning general-purpose benchmarks, materials informatics, and nuclear-grade graphite characterization. Results demonstrate a total P@1 score of 0.9, with known nearest-neighbor retrieval and cluster structure remaining robust across embedding ablations and differential privacy budgets. The proposed framework provides a principled pathway for integrating heterogeneous numeric data into retrieval-augmented generation pipelines while preserving statistical context, with direct applications to data-driven algorithm selection and simulation model initialization for unknown datasets.

Betting Against Integrity: Identifying Match-Fixing Through In-Play Market Dynamics

2026-05-28T16:44:54Z

Match-fixing undermines the integrity of sport by eroding public trust and threatening the financial sustainability of clubs and leagues. The global expansion of sports betting markets has created new incentives and opportunities for manipulation, calling for rigorous, data-driven monitoring tools. Football, which accounts for the largest share of global betting turnover, remains particularly exposed: integrity reports continue to flag several suspicious matches, with past scandals in Italy and Turkey underlining the problem's persistence. This study uses high-frequency live-betting data from the Italian Serie B (2018/19-2020/21) to explore statistical approaches for detecting abnormal betting behaviour. A state-space modelling framework is employed to describe standard betting market dynamics and to predict expected betting volumes conditional on match characteristics. Deviations from these expectations can then be analysed using outlier detection techniques to identify potentially suspicious periods. The results demonstrate how statistical modelling can contribute to the early identification of irregular betting patterns, thereby supporting integrity assurance in live sports betting markets.

Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks

2026-05-28T16:19:21Z

Predicting a complete spatially correlated field from sparse observations is a fundamental challenge in spatial statistics and environmental modelling. Classical interpolation methods such as Kriging rely on Gaussian process assumptions and variography, which can limit their effectiveness in non-stationary settings and require substantial domain expertise. In this work, we leverage an architecture based on convolutional neural networks (CNNs) for spatial interpolation that is trained and applied on a single partially observed field, without access to external data or prior fields. The model is supervised directly on the observed locations and learns to predict values at unobserved points on the user defined grid. Unlike Kriging, our method does not require explicit covariance modelling or variogram estimation, and it can flexibly capture local spatial patterns in a data-driven manner. This work demonstrates the potential of CNNs for single-instance spatial interpolation under sparse supervision, offering a practical alternative to classical geostatistical methods, and extending the use of CNNs to a new problem domain.