https://arxiv.org/api/0fnP60ilvvF0ofLlhE05C17d2gU2026-03-28T12:30:27Z163518015http://arxiv.org/abs/2508.03952v1A Blueprint to Design Curriculum and Pedagogy for Introductory Data Science2025-08-05T22:38:28ZAs the demand for jobs in data science increases, so does the demand for universities to develop and facilitate modernized data science curricula to train students for these positions. Yet, the development of these courses remains challenging, especially at the introductory level. To help instructors to meet this demand, we present a flexible blueprint that supports the development of a modernized introductory data science curriculum. This blueprint is narrated through the lens and experience in teaching the introductory data science course at \university{}. This is a large course that serves both STEM and non-STEM majors and includes the incorporation and facilitation of technologies such as R, RStudio, Quarto, Git, and GitHub. We identify and provide discussion around common challenges in teaching a modernized introductory data science course, detail a learning model for students to grow their understanding of data science concepts, and provide reproducible materials to help empower teachers to adopt and adapt such curriculum at their universities.2025-08-05T22:38:28Z33 pages, 4 figuresElijah MeyerMine Çetinkaya-Rundelhttp://arxiv.org/abs/2508.02966v1Measuring Human Leadership Skills with Artificially Intelligent Agents2025-08-05T00:05:54ZWe show that the ability to lead groups of humans is predicted by leadership skill with Artificially Intelligent agents. In a large pre-registered lab experiment, human leaders worked with AI agents to solve problems. Their performance on this 'AI leadership test' was strongly correlated with their causal impact on human teams, which we estimate by repeatedly randomly assigning leaders to groups of human followers and measuring team performance. Successful leaders of both humans and AI agents ask more questions and engage in more conversational turn-taking; they score higher on measures of social intelligence, fluid intelligence, and decision-making skill, but do not differ in gender, age, ethnicity or education. Our findings indicate that AI agents can be effective proxies for human participants in social experiments, which greatly simplifies the measurement of leadership and teamwork skills.2025-08-05T00:05:54ZBen WeidmannYixian XuDavid J. Deminghttp://arxiv.org/abs/2405.10453v2Expected Points Above Average: A Novel NBA Player Metric Based on Bayesian Hierarchical Modeling2025-08-01T20:45:17ZIn this paper, we propose two novel basketball metrics: ``expected points'' for team-based comparisons and ``expected points above average (EPAA)'' as a player-evaluation tool. Established within the Bayesian hierarchical model framework, teams and players are clustered based on their shooting propensities and abilities using posterior predictive distributions. We illustrate the concepts for the top 100 shot takers over the last decade and offer our metric as an additional metric for evaluating players. We compare our metrics to two traditional NBA player evaluation metrics: player efficiency rating and box plus/minus. Finally, we develop a Shiny web application that allows interested readers to make additional team and player comparisons.2024-05-16T21:40:42ZBenjamin WilliamsErin M. SchliepBailey FosdickRyan Elmorehttp://arxiv.org/abs/2310.13826v3A p-value for Process Tracing and other N=1 Studies2025-07-31T21:53:51ZWe introduce a method for calculating \(p\)-values to test causal hypotheses in qualitative research \emph{a la} process tracing. As in an experiment, our \(p\)-value tells us how often one would make the same or more compelling observations favoring one theory while entertaining a rival theory. We adapt Fisher's (1935) randomization-based urn model to the reality of qualitative researchers, who cannot randomize history, but can make observations about historical processes. Our test includes a method of sensitivity analysis which allows researchers to account for the possibility of observation bias, as well as a framework for representing the varying strenght of individual pieces of evidence, altoguether informing the robustness of qualitative causal inefernce. We provide simulations and replications of previously published work to illustrate how to execute our test using any type of qualitative data about events that took place within one case. This approach adds to the pluralistic turn in the use of probability theory in theory-testing process tracing by offering a simple model with provable conservatism, while relying on few assumptions the consequences of which can be directly assessed.2023-10-20T21:47:24ZMatias LopezJake Bowershttp://arxiv.org/abs/2507.23106v1Efficient inference of dynamic gene regulatory networks using discrete penalty2025-07-30T21:13:26ZGene regulatory networks (GRNs) orchestrate cellular decision making and survival strategies. Inferring the structure of these networks from high-dimensional transcriptomics data is a central challenge in systems biology. Traditional approaches to GRN inference, such as the graphical lasso and its joint extensions, rely on $\ell_1$ penalty to induce sparsity but can bias network recovery and require extensive hyperparameter tuning. Here, we present a scalable framework for the joint inference of dynamic GRNs using a discrete $\ell_0$ penalty, enabling direct and unbiased control over network sparsity. Leveraging recent algorithmic advances, we efficiently solve the resulting mixed-integer optimization problem for populations structured as arbitrary tree hypergraphs, accommodating both continuous and categorical distinctions among biological samples. After validating our method on synthetic benchmarks, we apply it to single-cell and spatial transcriptomics data from glioblastoma (GBM) patient tumors. Our approach reconstructs gene networks across tumor clusters, maps network rewiring along hypoxia gradients, and reveals niche-specific differences between primary and recurrent tumors. By providing a robust and interpretable tool for GRN inference in complex tissues, our work facilitates high-resolution dissection of tumor heterogeneity and adaptation, with broad applicability to emerging large-scale transcriptomic datasets.2025-07-30T21:13:26ZVisweswaran RavikumarAaresh BhathenaWajd N Al-HolouSalar FattahiArvind Raohttp://arxiv.org/abs/2409.05764v2Jackknife Empirical Likelihood Ratio Test for Cauchy Distribution2025-07-30T15:21:38ZHeavy-tailed distributions, such as the Cauchy distribution, are acknowledged for providing more accurate models for financial returns, as the normal distribution is deemed insufficient for capturing the significant fluctuations observed in real-world assets. Data sets characterized by outlier sensitivity are critically important in diverse areas, including finance, economics, telecommunications, and signal processing. This article addresses a goodness-of-fit test for the Cauchy distribution. The proposed test utilizes empirical likelihood methods, including the jackknife empirical likelihood (JEL) and adjusted jackknife empirical likelihood (AJEL). Extensive Monte Carlo simulation studies are conducted to evaluate the finite sample performance of the proposed test. The application of the proposed test is illustrated through the analysing two real data sets.2024-09-09T16:27:22Z15 pagesGanesh Vishnu AvhadAnanya LahiriSudheesh K. Kattumannilhttp://arxiv.org/abs/2507.22679v1An alternative method of adjusting for multiple comparison in medical research2025-07-30T13:42:36ZBackground Most methods of adjusting for multiplicity focus primarily on controlling type I errors and rarely consider type II errors. We propose a new method that considers controlling for false-positive findings while ensuring sufficient statistical power.
Methods We proposed a new method for multiple corrections called (Beta-exponential Adjustment, BEA) that considered the statistical power to control for type I errors while also considering the probability of type II errors. We conducted simulation studies to evaluate the performance characteristic of multiple testing correction procedures. We calculated sensitivity, specificity, and power separately for different sample sizes and number of biomarkers and compared them with the Bonferroni, Holm, and Benjamini-Hochberg (BH) correction methods.
Results The results demonstrated that our proposed BEA correction method exhibited the highest sensitivity at different sample sizes and biomarkers (e.g., sensitivity: BEA 0.8 versus BH 0.62 at sample size at 1000, tested biomarkers at 1000 and positive rate at 30%). With different sample sizes and number of biomarkers, the BEA correction method demonstrated comparable specificity compared with traditional methods. Moreover, we observed that the BEA-corrected had the highest statistical power than other methods, when the outcome was relatively rare.
Conclusion We proposed the BEA multiple correction method to adjust for multiple comparisons while considering statistical power. The BEA method demonstrated a higher sensitivity, comparable specificity, and higher statistical power, compared with traditional correction methods in different conditions. The BEA correction method can be an alternative of traditional methods of adjusting for multiplicity, especially in studies with small sample size, rare outcomes, or substantial number of biomarkers.2025-07-30T13:42:36Z20 pages, 5 figuresJiale LiZimu Weihttp://arxiv.org/abs/2507.21022v1A Generalized Cramér-Rao Bound Using Information Geometry2025-07-28T17:43:06ZIn information geometry, statistical models are considered as differentiable manifolds, where each probability distribution represents a unique point on the manifold. A Riemannian metric can be systematically obtained from a divergence function using Eguchi's theory (1992); the well-known Fisher-Rao metric is obtained from the Kullback-Leibler (KL) divergence. The geometric derivation of the classical Cramér-Rao Lower Bound (CRLB) by Amari and Nagaoka (2000) is based on this metric. In this paper, we study a Riemannian metric obtained by applying Eguchi's theory to the Basu-Harris-Hjort-Jones (BHHJ) divergence (1998) and derive a generalized Cramér-Rao bound using Amari-Nagaoka's approach. There are potential applications for this bound in robust estimation.2025-07-28T17:43:06ZPresented at the IEEE International Symposium on Information Theory (ISIT 2025)Satyajit DhadumiaM. Ashok Kumarhttp://arxiv.org/abs/2505.09619v5Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification2025-07-28T09:08:11ZThe management of chronic Heart Failure (HF) presents significant challenges in modern healthcare, requiring continuous monitoring, early detection of exacerbations, and personalized treatment strategies. In this paper, we present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk. This model is an ensemble learning approach, a modified stacking technique, that uses two specialized models leveraging clinical and echocardiographic features and then a meta-model to combine the predictions of these two models. We initially assess the model on a real dataset and the obtained results suggest that it performs well in the stratification of patients at HR risk. Specifically, we obtained high sensitivity (95\%), ensuring that nearly all high-risk patients are identified. As for accuracy, we obtained 84\%, which can be considered moderate in some ML contexts. However, it is acceptable given our priority of identifying patients at risk of HF because they will be asked to participate in the telemonitoring program of the PrediHealth research project on which some of the authors of this paper are working. The initial findings also suggest that ML-based risk stratification models can serve as valuable decision-support tools not only in the PrediHealth project but also for healthcare professionals, aiding in early intervention and personalized patient management. To have a better understanding of the value and of potentiality of our predictive model, we also contrasted its results with those obtained by using three baseline models. The preliminary results indicate that our predictive model outperforms these baselines that flatly consider features, \ie not grouping them in clinical and echocardiographic features.2025-04-07T14:07:05ZAiman FaizClaudio PascarelliGianvito MitranoGianluca FimianiMarina GarofanoMariangela LazoiClaudio PassinoAlessia Bramantihttp://arxiv.org/abs/2503.15382v3The information mismatch, and how to fix it2025-07-28T03:49:48ZWe live in unprecedented times in terms of our ability to use evidence to inform medical care. For example, we can perform data-driven post-test probability calculations. However, there is work to do. As has been previously noted, sensitivity and specificity, which play a key role in post-test probability calculations, are defined as unadjusted for patient covariates. In light of this, there have been multiple recommendations that sensitivity and specificity be adjusted for covariates. However, there is less work on the downstream clinical impact of unadjusted sensitivity and specificity. We discuss this here. We argue that unadjusted sensitivity and specificity, when mixed with covariate-dependent pre-test probability scores (which are more easily available nowadays given the multitude of online calculators), can lead to a post-test probability that contains an ``information mismatch.'' We write the equations behind such an information mismatch and discuss the steps that can be taken to fix it.2025-03-19T16:19:25ZSamuel J. WeisenthalAmit K. Chowdhryhttp://arxiv.org/abs/2407.18572v2Bernoulli amputation2025-07-25T07:56:50ZAn approach to amputation, the process of introducing missing values to a complete dataset, is presented. It allows to construct missingness indicators in a flexible and principled way via copulas and Bernoulli margins and to incorporate dependence in missingness patterns. Besides more classical missingness models such as missing completely at random, missing at random, and missing not at random, the approach is able to model structured missingness such as block missingness and, via mixtures, monotone missingness, which are patterns of missing data frequently found in real-life datasets. Properties such as joint missingness probabilities or missingness correlation are derived mathematically. The approach is demonstrated with mathematical examples and empirical illustrations in terms of a well-known dataset.2024-07-26T07:55:25ZMarius HofertJames JacksonNiels Hagenbuchhttp://arxiv.org/abs/2507.11833v2R2 priors for Grouped Variance Decomposition in High-dimensional Regression2025-07-24T21:55:23ZWe introduce the Group-R2 decomposition prior, a hierarchical shrinkage prior that extends R2-based priors to structured regression settings with known groups of predictors. By decomposing the prior distribution of the coefficient of determination R2 in two stages, first across groups, then within groups, the prior enables interpretable control over model complexity and sparsity. We derive theoretical properties of the prior, including marginal distributions of coefficients, tail behavior, and connections to effective model complexity. Through simulation studies, we evaluate the conditions under which grouping improves predictive performance and parameter recovery compared to priors that do not account for groups. Our results provide practical guidance for prior specification and highlight both the strengths and limitations of incorporating grouping into R2-based shrinkage priors.2025-07-16T01:40:56Z43 pages, 16 figuresJavier Enrique AguilarDavid KohnsAki VehtariPaul-Christian Bürknerhttp://arxiv.org/abs/2501.12596v2Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples2025-07-14T15:52:38ZThis expository paper introduces a simplified approach to image-based quality inspection in manufacturing using OpenAI's CLIP (Contrastive Language-Image Pretraining) model adapted for few-shot learning. While CLIP has demonstrated impressive capabilities in general computer vision tasks, its direct application to manufacturing inspection presents challenges due to the domain gap between its training data and industrial applications. We evaluate CLIP's effectiveness through five case studies: metallic pan surface inspection, 3D printing extrusion profile analysis, stochastic textured surface evaluation, automotive assembly inspection, and microstructure image classification. Our results show that CLIP can achieve high classification accuracy with relatively small learning sets (50-100 examples per class) for single-component and texture-based applications. However, the performance degrades with complex multi-component scenes. We provide a practical implementation framework that enables quality engineers to quickly assess CLIP's suitability for their specific applications before pursuing more complex solutions. This work establishes CLIP-based few-shot learning as an effective baseline approach that balances implementation simplicity with robust performance, demonstrated in several manufacturing quality control applications.2025-01-22T02:45:30Z36 pages, 13 figuresFadel M. MegahedYing-Ju ChenBianca Maria ColosimoMarco Luigi Giuseppe GrassoL. Allison Jones-FarmerSven KnothHongyue SunInez Zwetsloothttp://arxiv.org/abs/2507.08921v1Are Betting Markets Better than Polling in Predicting Political Elections?2025-07-11T17:03:39ZPolitical elections are one of the most significant aspects of what constitutes the fabric of the United States. In recent history, typical polling estimates have largely lacked precision in predicting election outcomes, which has not only caused uncertainty for American voters, but has also impacted campaign strategies, spending, and fundraising efforts. One intriguing aspect of traditional polling is the types of questions that are asked -- the questions largely focus on asking individuals who they intend to vote for. However, they don't always probe who voters think will win -- regardless of who they want to win. In contrast, online betting markets allow individuals to wager money on who they expect to win, which may capture who individuals think will win in an especially salient manner. The current study used both descriptive and predictive analytics to determine whether data from Polymarket, the world's largest online betting market, provided insights that differed from traditional presidential polling. Overall, findings suggest that Polymarket was superior to polling in predicting the outcome of the 2024 presidential election, particularly in swing states. Results are in alignment with research on ''Wisdom of Crowds'' theory, which suggests a large group of people are often accurate in predicting outcomes, even if they are not necessarily experts or closely aligned with the issue at hand. Overall, our results suggest that betting markets, such as Polymarket, could be employed to predict presidential elections and/or other real-world events. However, future investigations are needed to fully unpack and understand the current study's intriguing results, including alignment with Wisdom of Crowds theory and portability to other events.2025-07-11T17:03:39Z30 pages, 4 figuresLaurie E. CuttingSarah S. Hughes-BerheimPaul M. JohnsonHiba BaroudBrett Goldsteinhttp://arxiv.org/abs/2411.08547v4Frequentist Statistics as Internalist Reliabilism2025-07-11T10:58:51ZThere has long been an impression that reliabilism implies externalism and that frequentist statistics, due to its reliabilist nature, is inherently externalist. I argue, however, that frequentist statistics can plausibly be understood as a form of internalist reliabilism -- internalist in the conventional sense, yet reliabilist in certain unconventional and intriguing ways. Crucially, in developing the thesis that reliabilism does not imply externalism, my aim is not to stretch the meaning of `reliabilism' merely to sever the implication. Instead, it is to gain a deeper understanding of frequentist statistics, which stands as one of the most sustained attempts by scientists to develop an epistemology for their own use.2024-11-13T11:52:16ZHanti Lin