https://arxiv.org/api/PO3ohegpOmQiEh9kAblA9raazTg 2026-06-10T16:27:31Z 1686 270 15 http://arxiv.org/abs/2505.11944v1 Basic model for ranking microfinance institutions 2025-05-17T10:15:05Z

This paper discusses the challenges encountered in building a ranking model for aggregator site products, using the example of ranking microfinance institutions (MFIs) based on post-click conversion. We suggest which features of MFIs should be considered, and using an algorithm based on Markov chains, we demonstrate the ``usefulness'' of these features on real data. The ideas developed in this work can be applied to aggregator websites in microinsurance, especially when personal data is unavailable. Since we did not find similar datasets in the public domain, we are publishing our dataset with a detailed description of its attributes.

2025-05-17T10:15:05Z Dmitry Dudukalov Evgeny Prokopenko http://arxiv.org/abs/2404.02400v3 Improved Semi-Parametric Bounds for Tail Probability and Expected Loss: Theory and Applications 2025-05-14T05:24:51Z

Many management decisions involve accumulated random realizations for which only the first and second moments of their distribution are available. The sharp Chebyshev-type bound for the tail probability and Scarf bound for the expected loss are widely used in this setting. We revisit the tail behavior of such quantities with a focus on independence. Conventional primal-dual approaches from optimization are ineffective in this setting. Instead, we use probabilistic inequalities to derive new bounds and offer new insights. For non-identical distributions attaining the tail probability bounds, we show that the extreme values are equidistant regardless of the distributional differences. For the bound on the expected loss, we show that the impact of each random variable on the expected sum can be isolated using an extension of the Korkine identity. We illustrate how these new results open up abundant practical applications, including improved pricing of product bundles, more precise option pricing, more efficient insurance design, and better inventory management. For example, we establish a new solution to the optimal bundling problem, yielding a 17% uplift in per-bundle profits, and a new solution to the inventory problem, yielding a 5.6% cost reduction for a model with 20 retailers.

2024-04-03T02:06:54Z Zhaolin Li Artem Prokhorov http://arxiv.org/abs/2502.14581v2 A Statistical Case Against Empirical Human-AI Alignment 2025-05-12T09:51:39Z

Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.

2025-02-20T14:12:18Z 24 pages, 2 figures, 5 tables Julian Rodemann Esteban Garces Arias Christoph Luther Christoph Jansen Thomas Augustin http://arxiv.org/abs/2505.04176v1 Developing Assessment Methods for Evaluating Learning Experience 2025-05-07T07:08:10Z

This research aims to investigate the gender-based learning experiences of engineering students enrolled in the Probability and Statistics course, focusing on the four different assessment methods employed namely direct conceptual learning (DCL), symposium, applied deployment and collaborative learning. The study encompasses 299 engineering students, comprising 90 females and 209 males. Multivariate Analysis of Variance (MANOVA), is used to gain deeper insights into the complex interplay between assessment methods and their influence on student learning. The results of the statistical analysis reveal that there are significant differences in the learning outcomes between female and male engineering students in the assessment methods of direct conceptual learning, symposium, and applied deployment. The findings suggest that there is no significant difference in the learning outcomes between female and male engineering students in the collaborative learning assessment method. The graphical representation visually confirms the significant differences in direct conceptual learning, symposium, and applied deployment, while illustrating no significant difference in collaborative learning between female and male engineering students.

2025-05-07T07:08:10Z 9 pages, 4 Figures Maneesha http://arxiv.org/abs/2505.02298v1 Statisticians Training STEM Educators in Statistics Methods and Pedagogy: A Case Study of Instructor Training in Bayesian Methods 2025-05-05T00:22:40Z

Educating the next generation of scientists in statistical methodology is an important task. Educating their instructors in statistical content knowledge and pedagogical knowledge is as important and provides an indirect impact of students' learning. Statisticians are in a place to lead train-the-trainer (TTT) programs in different methods. We present our instructor training program in Bayesian methods as an effective case study of a TTT model. In addition to describing the details of the structure of our training program, we share our experience in designing and implementing our program including the challenges we face, the opportunities created, and our recommendations for TTT programs led by statisticians.

2025-05-05T00:22:40Z Mine Dogucu Jingchen Hu Amy H Herring http://arxiv.org/abs/2411.18481v3 Bhirkuti's Test of Bias Acceptance (BTBA): Examining Its Performance in Psychometric Simulations 2025-05-02T17:08:15Z

We introduce Bhirkuti's Test of Bias Acceptance (BTBA), a standardized framework for evaluating estimator bias in Monte Carlo simulation studies. BTBA uses a simulation-specific standardized score (Z*) and a decision matrix to assess bias acceptability based on the mean and variance of Z* distributions. Under ideal conditions, Z* values should approximate a standard normal distribution (Z-distribution) with a mean near zero and variance near one in the context of simulation research. Systematic deviations from these patterns such as shifted means or inflated variances indicate bias or estimator instability in simulation-based research. BTBA visualizes these patterns using ridgeline density plots, which reveal distributional features such as central tendency, spread, skewness, and outliers. Demonstrated in a latent growth modeling context, BTBA offers a reproducible and interpretable method for diagnosing bias across varying simulation conditions. By addressing key limitations of traditional relative bias (RB) metrics, BTBA provides a theoretically grounded, distribution-aware, transparent, and replicable alternative for evaluating estimator quality, particularly in psychometric modeling, structural equation modeling, and missing data research. Through this framework, we aim to enhance methodological decision-making by integrating statistical reasoning with comprehensive visualization techniques.

2024-11-27T16:24:47Z Aneel Bhusal Todd D. Little http://arxiv.org/abs/2505.00854v1 Mapping the Intersection of Research and Policy in Centers for Medicare National Coverage Decision Memos 2025-05-01T20:36:49Z

Evidence is a crucial component of federal policy, but the interactions between the various stakeholders involved in funding, producing, and using the results of scientific research, an important class of evidence, for federal policy are poorly understood. The national coverage determination process used by the Centers for Medicare and Medicaid Services (CMS) to make significant policies on healthcare coverage is an ideal candidate for studying the interactions between stakeholders producing and utilizing scientific research for policy. Memos produced during the national coverage determination process contain information that identifies the organizations funding and producing research articles cited by CMS policy staff. I use these data to map scientific articles and their funding sources to discrete federal policies with substantial economic and health impacts. My analysis highlights that information derived from policy documents can facilitate transparency among the stakeholders involved in funding, producing, and using evidence for federal policy.

2025-05-01T20:36:49Z 31 pages, 6 figures Sean A. Klein http://arxiv.org/abs/2102.10429v2 Taylor's Theorem and Mean Value Theorem for Random Functions and Random Variables 2025-04-30T01:17:22Z

This study addresses the often-overlooked issue of measurability at intermediate points when applying Taylor's theorems to random functions and random vectors (e.g., likelihood functions with respect to estimators) in statistics. Classical Taylor-related theorems were originally developed for deterministic settings. Consequently, they do not directly extend to stochastic functions and variables and do not inherently guarantee the measurability of intermediate points. In statistical contexts, applying these theorems without properly accounting for randomness can lead to analyses that lack well-defined probabilistic interpretations. Elementary approaches, such as pointwise constructions, are insufficient for handling random quantities and establishing measurable intermediate points. Moreover, some statistical literature has implicitly disregarded this issue, often neglecting the stochastic nature of the problem and assuming that intermediate points are measurable. To address this gap, we develop multivariate Taylor's and mean value theorems tailored for random functions and random variables under mild assumptions. We provide illustrative examples demonstrating the applicability of our results to commonly used statistical methods, including maximum likelihood estimation, $M$-estimation, and profile estimation. Our findings contribute a rigorous foundation for the applications of Taylor expansions in statistics.

2021-02-20T20:02:30Z Yifan Yang Xiaoyu Zhou Ming Wang http://arxiv.org/abs/2504.21566v1 Rendering LaTeX in R 2025-04-29T03:35:32Z

The xdvir package provides functions for rendering LaTeX fragments as labels, annotations, and data symbols in R plots. There are convenient high-level functions for rendering LaTeX fragments, including labels on ggplot2 plots, plus lower-level functions for more fine control over the separate authoring, typesetting, and rendering steps. There is support for making use of LaTeX packages, including TikZ graphics. The rendered LaTeX output is fully integrated with R graphics output in the sense that LaTeX output can be positioned and sized relative to R graphics output and vice versa.

2025-04-29T03:35:32Z 24 pages, 13 figures, submitted to The R Journal Paul Murrell http://arxiv.org/abs/2504.18982v1 On Bitcoin Price Prediction 2025-04-26T17:48:11Z

In recent years, cryptocurrencies have attracted growing attention from both private investors and institutions. Among them, Bitcoin stands out for its impressive volatility and widespread influence. This paper explores the predictability of Bitcoin's price movements, drawing a parallel with traditional financial markets. We examine whether the cryptocurrency market operates under the efficient market hypothesis (EMH) or if inefficiencies still allow opportunities for arbitrage. Our methodology combines theoretical reviews, empirical analyses, machine learning approaches, and time series modeling to assess the extent to which Bitcoin's price can be predicted. We find that while, in general, the Bitcoin market tends toward efficiency, specific conditions, including information asymmetries and behavioral anomalies, occasionally create exploitable inefficiencies. However, these opportunities remain difficult to systematically identify and leverage. Our findings have implications for both investors and policymakers, particularly regarding the regulation of cryptocurrency brokers and derivatives markets.

2025-04-26T17:48:11Z Grégory Bournassenko http://arxiv.org/abs/2504.18695v1 Local Polynomial Lp-norm Regression 2025-04-25T21:04:19Z

The local least squares estimator for a regression curve cannot provide optimal results when non-Gaussian noise is present. Both theoretical and empirical evidence suggests that residuals often exhibit distributional properties different from those of a normal distribution, making it worthwhile to consider estimation based on other norms. It is suggested that $L_p$-norm estimators be used to minimize the residuals when these exhibit non-normal kurtosis. In this paper, we propose a local polynomial $L_p$-norm regression that replaces weighted least squares estimation with weighted $L_p$-norm estimation for fitting the polynomial locally. We also introduce a new method for estimating the parameter $p$ from the residuals, enhancing the adaptability of the approach. Through numerical and theoretical investigation, we demonstrate our method's superiority over local least squares in one-dimensional data and show promising outcomes for higher dimensions, specifically in 2D.

2025-04-25T21:04:19Z Ladan Tazik Dept. of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan campus James Stafford Dept. of Statistical Sciences, University of Toronto John Braun Dept. of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan campus http://arxiv.org/abs/2504.16186v1 Analogy making as the basis of statistical inference 2025-04-22T18:23:24Z

Standard statistical theory has arguably proved to be unsuitable as a basis for constructing a satisfactory completely general framework for performing statistical inference. For example, frequentist theory has never come close to providing such a general inferential framework, which is not only attributable to the question surrounding the soundness of this theory, but also to its focus on attempting to address the problem of how to perform statistical inference only in certain special cases. Also, theories of inference that are grounded in the idea of deducing sample-based inferences about populations of interest from a given set of universally acceptable axioms, e.g. many theories that aim to justify Bayesian inference and theories of imprecise probability, suffer from the difficulty of finding such axioms that are weak enough to be widely acceptable, but strong enough to lead to methods of inference that can be regarded as being efficient. These observations justify the need to look for an alternative means by which statistical inference may be performed, and in particular, to explore the one that is offered by analogy making. What is presented here goes down this path. To be clear, this is done in a way that does not simply endorse the common use of analogy making as a supplementary means of understanding how statistical methods work, but formally develops analogy making as the foundation of a general framework for performing statistical inference. In the latter part of the paper, the use of this framework is illustrated by applying some of the most important analogies contained within it to a relatively simple but arguably still unresolved problem of statistical inference, which naturally leads to an original way being put forward of addressing issues that relate to Bartlett's and Lindley's paradoxes.

2025-04-22T18:23:24Z Possibly the final version Russell J. Bowater http://arxiv.org/abs/2504.15617v1 Spatiotemporal Assessment of Aircraft Noise Exposure Using Mobile Phone-Derived Population Estimates and High-Resolution Noise Measurements 2025-04-22T06:15:43Z

Aircraft noise exposure has traditionally been assessed using static residential population data and long-term average noise metrics, often overlooking the dynamic nature of human mobility and temporal variations in operational conditions. This study proposes a data-driven framework that integrates high-resolution noise measurements from airport monitoring terminals with mobile phone-derived de facto population estimates to evaluate noise exposure with fine spatio-temporal resolution. We develop hourly noise exposure profiles and quantify the number of individuals affected across regions and time windows, using both absolute counts and inequality metrics such as Gini coefficients. This enables a nuanced examination of not only who is exposed, but when and where the burden is concentrated. At our case study airport, operational runway patterns resulted in recurring spatial shifts in noise exposure. By incorporating de facto population data, we demonstrate that identical noise operations can yield unequal impacts depending on the time and location of population presence, highlighting the importance of accounting for population dynamics in exposure assessment. Our approach offers a scalable basis for designing population-sensitive noise abatement strategies, contributing to more equitable and transparent aviation noise management.

2025-04-22T06:15:43Z Soohwan Oh Hyunsoo Cho Jungwoo Cho http://arxiv.org/abs/2504.11035v2 A conceptual synthesis of causal assumptions for causal discovery and inference 2025-04-21T18:55:20Z

This work presents a conceptual synthesis of causal discovery and inference frameworks, with a focus on how foundational assumptions -- causal sufficiency, causal faithfulness, and the causal Markov condition -- are formalized and operationalized across methodological traditions. Through structured tables and comparative summaries, I map core assumptions, tasks, and analytical choices from multiple causal frameworks, highlighting their connections and differences. The synthesis provides practical guidance for researchers designing causal studies, especially in settings where observational or experimental constraints challenge standard approaches. This guide spans all phases of causal analysis, including question formulation, formalization of background knowledge, selection of appropriate frameworks, choice of study design or algorithm, and interpretation. It is intended as a tool to support rigorous causal reasoning across diverse empirical domains.

2025-04-15T09:58:06Z Withdrawn for incorporation as supplementary material in a broader collaborative manuscript; integration will be reflected in a new preprint Hannah E. Correia http://arxiv.org/abs/2504.15246v1 A Refreshment Stirred, Not Shaken (III): Can Swapping Be Differentially Private? 2025-04-21T17:19:57Z

The quest for a precise and contextually grounded answer to the question in the present paper's title resulted in this stirred-not-shaken triptych, a phrase that reflects our desire to deepen the theoretical basis, broaden the practical applicability, and reduce the misperception of differential privacy (DP)$\unicode{x2014}$all without shaking its core foundations. Indeed, given the existence of more than 200 formulations of DP (and counting), before even attempting to answer the titular question one must first precisely specify what it actually means to be DP. Motivated by this observation, a theoretical investigation into DP's fundamental essence resulted in Part I of this trio, which introduces a five-building-block system explicating the who, where, what, how and how much aspects of DP. Instantiating this system in the context of the United States Decennial Census, Part II then demonstrates the broader applicability and relevance of DP by comparing a swapping strategy like that used in 2010 with the TopDown Algorithm$\unicode{x2014}$a DP method adopted in the 2020 Census. This paper provides nontechnical summaries of the preceding two parts as well as new discussion$\unicode{x2014}$for example, on how greater awareness of the five building blocks can thwart privacy theatrics; how our results bridging traditional SDC and DP allow a data custodian to reap the benefits of both these fields; how invariants impact disclosure risk; and how removing the implicit reliance on aleatoric uncertainty could lead to new generalizations of DP.

2025-04-21T17:19:57Z 27 pages, 1 figure James Bailie Ruobin Gong Xiao-Li Meng