Differing Roles of Leisure and Productivity in GDP - A Machine Learning based comparative analysis of Germany and USA

2026-05-31T13:32:39Z

The GDP of a country is modelled as the relative interaction between two agents - working hours, reflecting the social choice of a population, and Total Factor Productivity, reflecting the collective investment in productivity enhancers. It is shown that a Random Forest model can accu- rately predict the GDP from these two factors. The differences in the choices made by Germany and USA are analysed though Gini importance, SHAP plots and partial dependency. It is shown that the differences in the social structure of the countries are reflected in the relative contribution of working hours and productivity to the GDP.

Tokenized but Illiquid? Evidence from Real-World Asset Markets

2026-05-31T10:05:25Z

Real-world asset tokenization is often presented as a mechanism for improving the liquidity of traditionally illiquid assets. However, on-chain representation and secondary-market liquidity are distinct outcomes. This paper examines whether tokenized real-world assets exhibit meaningful observed liquidity and identifies the token characteristics associated with higher market activity. Using token-level data from RWA.xyz and supplemental contract-level observations from Etherscan, the study constructs an Ethereum-based monthly panel of non-stablecoin real-world assets across three prominent categories: U.S. Treasury-backed tokens, gold-backed commodity tokens, and private-credit-related tokens. Liquidity is measured using turnover, active addresses, and an active-month indicator. The empirical design combines descriptive statistics, non-parametric group tests, and exploratory panel regressions suited to short and sparse token histories. The results show substantial heterogeneity across asset categories. Gold-backed tokens exhibit broader holder bases and more persistent on-chain activity than many Treasury and private-credit-related products, while outstanding asset value alone does not reliably predict observed liquidity. The paper contributes to the literature by developing a clearer empirical measurement framework for real-world-asset liquidity and showing that tokenization and liquidity should be analyzed as distinct outcomes.

Machine Learning Surrogate Modeling for Homogenization of Hyperelastic Materials with Boolean Microstructures

2026-05-31T00:51:52Z

Data-driven surrogate models are an alternative to numerical homogenization of heterogeneous materials. In this contribution, a supervised learning approach is presented for predicting effective Lamé parameters of hyperelastic composites from low-dimensional microstructural descriptors. The data set is based on previously published numerical homogenization results for ensembles of two-phase stochastic microstructures generated by planar Boolean models, covering variations of inclusion shape, phase contrast, and area fraction; see Brändel, Brands, Maike, Rheinbach, Schröder, Schwarz and Stoyan (2022). A neural network is trained on combinations of scalar and curve-valued statistical descriptors, including the area fraction, a derived scalar shape descriptor $τ$, the two-point correlation function $S_2(r)$, and the lineal-path function $\ell(z)$. Additional data representing limiting cases of the parameter space are incorporated to stabilize training and improve extrapolation behavior. The surrogate is evaluated by leave-one-grain-type-out cross-validation in order to assess generalization to unseen grain geometries. Numerical results demonstrate that additional descriptors can reduce relative errors. A predictor trained with $τ$ and $S_2(r)$ provides a compact representation with good quantitative accuracy and regular dense response behavior. Adding the lineal-path function $\ell(z)$ further reduces the error at the available data points, indicating that it is a promising additional descriptor; however, dense post-training response evaluations show that improved pointwise accuracy does not automatically guarantee physically admissible behavior between sampled parameter values. This motivates future work on physically constrained surrogate models, loss formulations, bounded output parametrizations, and a more systematic representation of curve-valued geometric descriptors.

Cellular Sheaf Neural Operators for Structure-Preserving Surrogate Modeling of Constrained PDEs

2026-05-31T00:49:25Z

Neural operators provide fast surrogate models for PDE simulations, but standard architectures often treat geometry and discretization as secondary to field data. Physical states are usually represented as grid-channel stacks, even when different quantities naturally belong on vertices, edges, faces, cells, boundaries, or interfaces and must satisfy compatibility constraints. We propose Cellular Sheaf Neural Operators, a discretization-aware framework for structure-preserving neural PDE surrogates. The method represents PDE states on oriented cell complexes, couples local feature spaces through learned restriction maps, and uses incidence/Hodge-informed message passing to follow computational geometry. Learned update heads pass through coboundary or flux maps, allowing selected constraints to arise from cell-complex structure rather than only from loss penalties. For magnetohydrodynamics, this yields face-based magnetic-flux updates driven by edge electromotive fields and finite-volume-style fluid updates driven by learned face fluxes and cell sources. On turbulent MHD and fusion-equilibrium surrogate tasks, the method improves structure-sensitive diagnostics, including rollout behavior, divergence control, spectral error, and equilibrium-regression accuracy. These results indicate that cellular-sheaf structure is a useful inductive bias for neural PDE surrogates in constrained multiphysics systems.

Graph Attention-Based Virtual Metrology for Film Deposition Processes in Semiconductor Manufacturing

2026-05-30T23:18:32Z

Artificial intelligence-driven semiconductor manufacturing increasingly operates at nanometer and angstrom scales, where precise process control depends on accurate and timely metrology. However, physical metrology is limited by measurement latency, cost, and sampling constraints, restricting its scalability in high-volume production. Virtual metrology (VM) has emerged as an effective alternative by predicting wafer-level characteristics from equipment sensor data. Despite recent advances, many existing VM models remain correlation-driven and lack the ability to capture structured dependencies among heterogeneous process variables, while providing limited interpretability. This study presents a graph attention-based VM framework for film deposition processes that integrates temporal feature learning with structured parameter-layer dependency modeling. The proposed approach represents each step-parameter pair as a node and extracts temporal embeddings from high-frequency equipment traces using convolutional feature encoders. A parameter-to-layer graph attention mechanism is employed to model directional dependencies, enabling each film layer to aggregate relevant process information. The framework is evaluated using industrial deposition data collected from production wafers, where the model predicts film thickness from multivariate sensor signals. Experimental results demonstrate improved predictive performance compared to baseline models. In addition, analysis of the learned attention weights reveals interpretable parameter-layer relationships consistent with physical process behavior, capturing dominant process factors and temporal dependencies across deposition stages. These results indicate that the proposed framework enhances prediction accuracy and provides meaningful insight into process dynamics, supporting effective monitoring and optimization in semiconductor manufacturing.

An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechanical Thrombectomy Simulations for Ischemic Stroke

2026-05-30T20:54:55Z

The treatment of ischemic stroke using mechanical thrombectomy involves difficult decisions under intense time constraints. Numerical physics simulations can in theory inform operators to make better decisions regarding treatment approaches and device selection, but are too slow to do so in practice. In this thesis, we investigate if current machine learning based surrogates can accurately emulate these simulations in a step-by-step manner while making them significantly faster. To do this we train three surrogate models on two simulations that involve a simplified aspiration procedure, with varying levels of geometric complexity. Our results show that two of our models accurately predict singular simulation steps and provide substantial speedups, especially when combined with specific data augmentations. However, the models showed a lack of stability when emulating simulations with complex geometries over longer time periods. Overall, this work provides a foundation for future studies to develop stable methods that scale to realistic numerical physics simulations of mechanical thrombectomy.

A multimodal dataset of photoplethysmography and continuous behavioral responses to ASMR and nature videos

2026-05-30T14:36:10Z

Autonomous Sensory Meridian Response (ASMR) is a somatosensory phenomenon characterized by pleasant tingling sensations and cardiovascular slowing. However, ASMR research has been hindered by a dearth of standardized, open-access multimodal datasets. To address this limitation, we present REST-ASMR (Response to Environmental & Sensory Triggers), a synchronized multimodal dataset designed to capture behavioral reports and physiological dynamics during ASMR, with nature-relaxation videos as control stimuli. The dataset includes high-resolution photoplethysmography (PPG), time-aligned audiovisual stimuli, and continuous subjective annotations from 34 participants. Technical validation showed high stimulus efficacy (97% responder rate), significant stimulus-specific inter-subject agreement (p < 0.05), and a robust PPG-derived ASMR-specific cardiovascular deceleration. Additionally, a Bidirectional Long-Short Term Memory model successfully predicted subjective ASMR tingle states, achieving video-level ASMR vs. Nature classification with perfect accuracy and a frame-level global mean accuracy of 75.51%, macro F1-score of 71.86%, and 100% Nature-baseline specificity, under a strict, leakage-free subject-video double-independent 4-fold cross-validation. REST-ASMR constitutes a dense temporal foundation for affective computing, multimodal research, and the development of personalized models of relaxation-related responses.

Higher-order Network Analysis of Human Mobility Data

2026-05-30T13:55:51Z

The detailed study of individual human mobility requires large-scale high-resolution datasets, but collecting such datasets in a way that is both statistically powerful and privacy preserving is a challenging and expensive task. In response, researchers have built tools to generate complex synthetic populations of agents that can be used to simulate synthetic individual mobility data, potentially obviating the difficulties of data collection. While these simulation-based approaches offer a promising avenue for expanding individual mobility research, it is difficult to asses whether such tools are effective at generating realistic mobility traces. In this work, we develop a framework for comparing observed and simulated mobility data using a higher-order network framework that focuses on analyzing patterns of movement in the paths individuals take through the underlying infrastructure network. We apply our framework to a case study comparing the NetMob 2025 Data Challenge Dataset, which includes individual mobility data for thousands of residents of the Île-de-France region, with a sophisticated open-source synthetic population and mobility simulation model of the same region. We show that while simulated mobility data is indeed promising as a surrogate for observed mobility, there are some key limitations to the simulation paradigm from a path-based perspective, which we discuss along with potential future remediations and open challenges for higher-order mobility network analysis.

To Wait or To Probe: Arbitrage Competition on High-Throughput Blockchains

2026-05-30T13:07:24Z

Maximal Extractable Value (MEV) on high-throughput blockchains can be captured through targeted search, where bots identify opportunities off-chain and submit route-committed transactions, or through probabilistic search, where bots submit repeated attempts that resolve opportunity discovery during on-chain execution. This distinction has direct implications for spam, blockspace consumption, and protocol fee revenue. We model how ordering granularity, fee floors, and opportunity-access shocks shape competition between these architectures. Using cyclic arbitrage data on Base from June 2025 to February 2026, we develop a trace-level classifier for search architectures and show that the resulting labels correspond to distinct execution behavior. We test the model across three episodes: Flashblocks selects against broad on-chain probabilistic scanners; token-launch opportunity shocks temporarily revive probabilistic search; and higher fee floors select against probabilistic bots whose opportunity flow cannot sustain repeated attempts. In our sample, probabilistic search accounts for only 23% of arbitrage activity but produces 95% of spam and consumes 20% of Base gas. After Base's configuration changes, protocol fee revenue shifts toward successful arbitrages and away from spam, probabilistic bots pay higher priority fees, and spam consumes a smaller share of blockspace.

When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs

2026-05-29T23:40:13Z

Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and metrics typically rely on published synthetic procedures and Top-K accuracy based on single ground-truth, which does not capture the open-ended nature of real-world synthesis planning. We propose a new benchmarking framework for single-step retrosynthesis that evaluates both general-purpose and chemistry-specialized LLMs using ChemCensor, a novel metric for chemical plausibility. By emphasizing plausibility over exact match, this approach better aligns with human synthesis planning practices. We also introduce CREED, a novel dataset comprising millions of ChemCensor-validated reaction records for LLM training, and use it to train a model that improves over the LLM baselines under this benchmark.

Streami: An MPI Data-Parallel Library to Compute Field Lines on GPUs

2026-05-29T20:55:35Z

We present Streami, an extensible GPU-accelerated library for the computation of field lines in fluid flows on high-performance computers. Streami acts as a thin layer used for both post-hoc or in-situ analysis and can interface with existing MPI applications. We discuss Streami's application programming interface, key design decisions that led to Streami's high performance and extensibility, as well as extensions to support different fluid flow field representations. We also present a sample application for rapid prototyping and interactive seed point placement. Streami is released under a permissive open-source software license.

(HB-ARFM) History-Bootstrapped Flow Matching for Inverse Boiling Reconstruction

2026-05-29T20:39:28Z

Reconstructing spatiotemporal fields from partial observations is fundamental to scientific inference, from inferring atmospheric states from satellite data to recovering fluid states from imaging. When observations are incomplete, the inverse problem is fundamentally ill-posed: even when the underlying PDE dynamics are Markovian in the full state, partial observation operators induce a non-Markovian posterior that cannot be resolved from a single timestep. We propose a history-bootstrapped autoregressive flow matching (HB-ARFM) for spatiotemporal inverse reconstruction under partial observability. Observation history bootstraps the initial reconstruction via conditional flow matching, reducing ambiguities. The same conditional transport model is then applied autoregressively, conditioning on both new observations and past predictions to propagate the reconstruction forward in time. We evaluate the method on boiling dynamics reconstruction, recovering full velocity and temperature fields from interface geometry and motion. Across two inverse tasks with varying observation sparsity, HB-ARFM produces physically and temporally valid reconstructions where other models fail.

Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations

2026-05-29T19:29:00Z

Modern high-throughput biological datasets containing thousands of perturbations enable large-scale discovery of causal graphs that represent regulatory interactions between genes. Differentiable causal graphical models and regression-based methods have been developed to infer gene regulatory networks (GRNs) from interventional datasets. However, existing approaches fail to capture the non-linear dynamics of biological processes such as cellular differentiation. To address this limitation, we propose PerturbODE, a novel framework that employs interpretable neural ordinary differential equations (neural ODEs) to model cell state trajectories under perturbations and derive the underlying causal GRN from the neural ODE parameters, enabling downstream simulation of unseen genetic interventions. The GRN is encoded via a single-hidden-layer feedforward network, implicitly grouping genes into interpretable co-regulated modules. We demonstrate PerturbODE's efficacy in GRN inference and extension to perturbation response prediction across both simulated and real overexpression datasets.

Training Diffusion Language Models for Black-Box Optimization

2026-05-29T18:01:09Z

We study offline black-box optimization (BBO), aiming to discover improved designs from an offline dataset of designs and labels, a problem common in robotics and DNA with limited labeled samples. While recent work applies autoregressive LLMs to BBO by formatting tasks as natural-language prompts, their left-to-right design generation struggles to capture the strong bidirectional dependencies inherent in design problems. To address this, we propose adapting diffusion LLMs to offline BBO to leverage their bidirectional modeling capabilities. However, a domain gap exists between the natural text pre-training of diffusion LLMs and the heterogeneous signals in BBO (prompts, designs, and labels). To bridge this gap, we construct a unified prompt--response corpus and introduce delimiter tokens to explicitly mark field boundaries for domain adaptation. We further propose a two-stage post-training framework to align the diffusion LLM generation with high-label designs. The first stage performs supervised fine-tuning on the unified dataset via masked-response prediction, and the second stage adopts reinforcement learning with rewards defined by label improvements. Our method achieves state-of-the-art results on Design-Bench under small-data settings with highly efficient training, requiring only $1.5$ H100 GPU hours for discrete tasks. Code for our work is available here: https://github.com/zpointS/DiBO.

Can dents and gouges compromise the structural integrity of hydrogen transport pipelines?

2026-05-29T17:22:30Z

Repurposing natural gas pipelines for hydrogen transport requires understanding how external defects, like dents and gouges, affect structural integrity under H$_2$ exposure. To address this, we combine experiments with a new hydrogen embrittlement model aimed at large plastic straining scenarios, which integrates: (i) multi-trap hydrogen transport, (ii) finite-strain plasticity, and (iii) a hydrogen- and triaxiality-dependent damage law. Each constituent of the model is validated with experiments on X65 pipeline steel: (i) hydrogen permeation, (ii) full-scale pipe-indentation, and (iii) mechanical testing at different hydrogen and triaxiality levels. The validated model is used to study \textit{passive} (indent before H$_2$ exposure) and \textit{active} (indent with H$_2$) dents and gouges. Results reveal that hydrogen does not significantly increase the damage severity of those defects, unless hydrogen egress is completely precluded at the outer surface of a pipeline that is being pressurised internally and contains a pre-existing \textit{passive} dent with a gouge.