https://arxiv.org/api/BlVHMulGhIFuysoFKtkesf6pCLA2026-05-16T00:27:11Z118561515http://arxiv.org/abs/2605.09927v1Information Extraction of Nested Complex Structure of Quantum Cascade Lasers via Large Language Models2026-05-11T03:23:46ZThe rapid advancement of Large Language Models has transformed scientific research workflows, including enabling the automated extraction of data directly from published literature. Most existing efforts, however, focus on extracting simple labeled key-value entities, whereas many scientific applications require more complex, hierarchically structured data. A representative example is Quantum Cascade Lasers, whose device architectures are defined by tens of interdependent parameters organized in nested layer sequences. In this work we propose a \emph{JSON-Schema Guided Information Extraction Pipeline} (JSG-IE) that enables reliable extraction of deeply structured device data without model fine-tuning. By transforming extraction into a schema-constrained generation task, our approach significantly improves structural consistency and accuracy. Across 12 state-of-the-art LLMs, a properly designed JSON Schema improves performance by 5.7\% over conventional prompting, with the highest $F_1$ score up to 83.4\%, achieved by the reasoning-enabled Kimi-k2-thinking model. Importantly, this performance enhancement is most significant for mid-tier and open-source models, where $F_1$ gains reach as high as 24.1\%, effectively enabling these widely accessible models to achieve extraction fidelity previously restricted to much larger architectures. This framework provides a scalable path toward automated construction of high-fidelity device databases, accelerating data-driven optoelectronic design.2026-05-11T03:23:46ZXiao FangMing LüHanwen LiangXingshen SongKele XuHui CaiChaofan Zhanghttp://arxiv.org/abs/2605.09843v1Sparse Spectral Imaging for Thickness Mapping of 3R-MoS$_2$ on PDMS2026-05-11T00:54:38ZWe present a non-destructive, spatially resolved thickness characterization method for rhombohedral (3R) molybdenum disulfide (MoS$_2$) on polydimethylsiloxane (PDMS) substrates. Unlike broadband spectroscopic approaches, the proposed method reduces the measurement to a small number of discrete intensity images, enabling direct thickness mapping with a conventional microscope architecture and commercially available bandpass filters. Our approach combines a systematic framework for selecting optimal discrete wavelength samples of the material's reflectance with a robust thickness retrieval algorithm based on a multivariate Gaussian probability model. By sampling the reflectance with just five strategically chosen near-infrared bandpass filters, we demonstrate thickness characterization up to 691 nm with a mean 95% confidence-interval width of 8.3 nm. The method is adaptable to other van der Waals materials and conventional optical thin-film systems. It therefore provides a foundation for scalable, real-time thickness characterization in, e.g., dry-transfer fabrication workflows, where thickness screening remains a critical bottleneck for the production of van der Waals heterostructure devices.2026-05-11T00:54:38Z25 pages, 13 figuresBenjamin LaudertFatemeh AbtahiSarka VavreckovaSebastian W. SchmittFalk Eilenbergerhttp://arxiv.org/abs/2605.11018v1Correction of STEM Distortions2026-05-10T15:16:02ZThe manuscript considers Scanning Transmission Electron Microscopy (STEM) images and derives transformations needed to correct various distortions occurring during scanning. These transformations form the basis for the correction algorithms implemented in the CEOS Panta Rhei and TEMDM software. The manuscript is intended as a technical reference and is meant to be published only on arXiv rather than in peer-reviewed journals.2026-05-10T15:16:02ZPavel PotapovGiulio Guzzinatihttp://arxiv.org/abs/2605.13878v1Revealing dynamics of non-autonomous complex systems from data2026-05-10T09:20:55ZDiscovering governing equations from data is crucial for understanding complex systems in many diverse fields from science to engineering. Yet, there still is a lack of versatile computational toolbox to deal with this long standing challenge due to the inherent non-autonomicity and unknowability of the underlying dynamics. Here, we introduce a data-driven approach for inferring non-autonomous dynamical equations by identifying an optimal set of basis functions within the model space, enabling the reconstruction of complex systems behavior under simplified prior specifications. Our method demonstrates effectiveness in equation discovery on canonical synthetic systems such as cusp bifurcation and coupled Kuramoto oscillators. Furthermore, we extend the application of this approach to leaf cellular energy, unmanned aerial vehicle navigation, chick-heart aggregates, and marine fish community under simple basis function libraries. Leveraging the inferred equations, we accurately predict the evolution of these empirical systems and further uncover their governing laws. Our approach offers a novel paradigm to reveal the underlying dynamics of a wide range of real-world systems.2026-05-10T09:20:55ZChengzuo ZhugeZheng JiangZhefan XuWei Chenhttp://arxiv.org/abs/2603.20904v4Sparse Weak-Form Discovery of Stochastic Generators2026-05-09T13:15:27ZWe propose a novel data-driven algorithm for the sparsimonious symbolic discovery of stochastic differential equations (SDEs). The central novelty of our approach lies in extending the Weak-formulation framework to stochastic SINDy, which explicitly avoids computing noisy finite-difference derivative estimates that arise in Kramers-Moyal-based formulations, thus improving robustness in the presence of stochasticity and measurement noise. We further show that the introduction of spatial Gaussian test functions in place of conventional temporal test functions for the Weak-formulation problem preserves unbiasedness in expectation and mitigates the structural regression bias that commonly emerges in temporal test-function approaches. We validate the algorithm on three standard stochastic systems, for which we recover all active non-linear terms with coefficient errors below 4%, stationary-density total-variation distances below 0.01, and autocorrelation functions that reproduce true relaxation timescales across all three benchmarks faithfully.2026-03-21T18:28:10Z21 pages, 5 figuresEshwar R AGajanan V. Honnavarhttp://arxiv.org/abs/2605.08788v1The Phase Structure of Metallic Money: An MPTT Framework for the Spanish Price Revolution2026-05-09T08:17:11ZThe Spanish Price Revolution is usually treated as a classic case in which American bullion inflows expanded the money supply and generated inflation. This view captures the first phase of the episode but fails to explain why the same monetary expansion did not continue to produce proportional price growth after 1600. We develop a two-phase Money Phase Transition Theory (MPTT) model in which the classical monetary relation is recovered before a transition point, while a second-phase correction term modifies the money-price transmission coefficient after the transition. Using annual Spanish CPI and reconstructed money-supply data, we show that 1500-1600 was a high-transmission metallic inflationary phase: CPI increased approximately 3.35-fold while money supply increased approximately 3.73-fold. After 1600, money supply continued to rise, increasing approximately 1.82-fold during 1600-1650, while CPI rose only approximately 1.22-fold. A classical one-phase model fitted on 1500-1600, therefore, overpredicts post-1600 prices when extrapolated forward. The MPTT two-phase model with transition point tau=1600 estimates beta_1=0.949, gamma=-0.812, and beta_2=beta_1+gamma=0.137, indicating a sharp post-transition weakening of monetary transmission. An unrestricted break scan identifies a deeper BIC-minimizing break around 1636. These results suggest that the Spanish Price Revolution was not a single monotonic bullion-inflation process but the rise and exhaustion of high-transmission metallic money inflation.2026-05-09T08:17:11Z12 pages, 2 figuresRan Huanghttp://arxiv.org/abs/2601.17621v3Non-parametric finite-sample credible intervals with one-dimensional priors: a middle ground between Bayesian and frequentist intervals2026-05-08T15:55:28ZWe present a method of constructing statistical intervals that obtain a natural middle ground between Bayesian and frequentist statistical intervals, previously unexplored in literature: To a p% Bayesian credible interval we should assign a p% belief after observing both the dataset and the interval, to p% frequentist intervals we can generally only assign a p% belief before observing either the data or the interval, while to the intervals proposed here we can assign a p% belief after observing the interval, but not necessarily after inspecting the full dataset ourselves.
Even in fully non-parametric problems this only requires a prior over the parameter(s) of interest, not a high-dimensional prior over the full distribution, while maintaining many of the practical and philosophical advantages of Bayesian methods. We belief these methods may therefore provide significant advances in statistical methodology to a number of fields. This work is meant as a proof of principle: We concretely implement such intervals for two different problems and study the properties of resulting intervals. We discuss promising directions where the proposed type of interval may provide significant advantages.2026-01-24T22:53:50ZTim Ritmeesterhttp://arxiv.org/abs/2605.07819v1Probabilistic denoising for reliable signal extraction in spectroscopy2026-05-08T14:50:13ZWhile deep learning offers powerful capabilities for scientific research, its application is often hindered by a lack of quantitative reliability. To address this, we introduce a probabilistic denoising framework that simultaneously extracts denoised signals and element-wise predictive uncertainties from noisy data. We demonstrate this approach on three-dimensional angle-resolved photoemission spectroscopy data, showing that the model reliably recovers the spectral features of a cuprate superconductor from Poisson-distributed noise with an average count of only 0.02 electrons per voxel. Crucially, we show that these predicted uncertainties can be propagated into subsequent superconducting gap analyses, enabling quantitative parameter extraction with scientifically meaningful error bars. Furthermore, we validate the broad applicability of our approach by successfully extending it to two-dimensional X-ray diffraction data. Ultimately, this approach establishes uncertainty-aware deep learning not merely as a visualization tool, but as a rigorous framework for scientific data analysis.2026-05-08T14:50:13Z8 pages, 5 figuresYounsik KimChangyoung Kimhttp://arxiv.org/abs/2605.07714v1Selectivity- and Activity-Aware Catalyst Descriptors for CO$_2$ Hydrogenation on Alloy Nanocatalysts using Machine-Learned Force Fields2026-05-08T13:17:37ZAdsorption energy distributions (AEDs) have emerged as a powerful and increasingly adopted descriptor for catalytic performance in high-entropy alloys and, more recently, in conventional metallic alloy nanocrystal catalysts. By accounting for diverse adsorption sites and crystallographic facets, AEDs more fully represent nanoparticle-based catalytic surfaces and show strong promise for accelerating rational design and discovery of heterogeneous catalysts, especially for CO$_2$ hydrogenation. However, previous approaches have not sufficiently resolved facet-specific contributions, despite the catalytic significance and prevalence of certain Miller planes in nanoscale catalysts, limiting their applicability in predicting activity and selectivity. Here, we introduce an updated facet-resolved framework for predicting catalytic activity, which also enables insight into selectivity toward C1 products. Universal machine-learned force fields trained on Open Catalyst Project data were employed to compute adsorption energetics across 226 experimentally observed metals, binary alloys, and ternary alloys, encompassing 1.4 million adsorption sites on 2,626 crystallographically distinct surfaces. Using statistical and unsupervised learning techniques, we analyzed facet-specific AEDs to identify highly active and methanol-selective facets. Our approach provides insight into the relationship between structure and catalytic performance metrics like activity and selectivity, and presents a set of alloy compositions and their respective surface orientations for experimental validation toward highly selective CO$_2$ hydrogenation.2026-05-08T13:17:37Z30 pages, 5 figures + 1 toc, 2 tables, Supplementary InformationPrajwal PisalOndřej KrejčíPatrick Rinkehttp://arxiv.org/abs/2507.01064v3Functional Renormalization for Signal Detection: Dimensional Analysis and Dimensional Phase Transition for Nearly Continuous Spectra Effective Field Theory2026-05-08T10:28:16ZSignal detection in high dimensions is a critical challenge in data science. While standard methods based on random matrix theory provide sharp detection thresholds for finite-rank perturbations, such as the known Baik-Ben Arous-Péché (BBP) transition, they are often insufficient for realistic data exhibiting nearly continuous (extensive-rank) signal distributions that merge with the noise bulk. In this regime, typically associated with real-world scenarios such as images for computer vision tasks, the signal does not manifest as a clear outlier but as a deformation of the spectral density's geometry. We use the functional renormalisation group (FRG) framework to probe these subtle spectral deformations. Treating the empirical spectrum as an effective field theory, we define a scale-dependent "canonical dimension" that acts as a sensitive order parameter for the spectral geometry. We show that this dimension undergoes a sharp crossover, interpreted as a "dimensional phase transition", at signal-to-noise ratios significantly lower than the standard BBP threshold. This dimensional instability is shown to correlate with a spontaneous symmetry breaking in the effective potential and a deviation of eigenvector statistics from the universal Porter-Thomas distribution, confirming the consistency of the method. Such behaviour aligns with recent theoretical results on the "extensive spike model", where signal information persists inside the noise bulk before any spectral gap opens. We validate our approach on realistic datasets, demonstrating that the FRG flow consistently detects the onset of this bulk deformation. Finally, we explore a formalisation of this methodology for analysing nearly continuous spectra, proposing a heuristic criterion for signal detection and a method to estimate the number of independent noise components based on the stability of these canonical dimensions.2025-06-30T18:00:09Z36 pages; update figuresJ. Stat. Mech. (2026) 043403Riccardo FinotelloVincent LahocheDine Ousmane Samary10.1088/1742-5468/ae5a21http://arxiv.org/abs/2408.11065v2Statistical Patterns in the Equations of Physics and the Emergence of a Meta-Law of Nature2026-05-07T20:50:35ZPhysics seeks to uncover the laws of Nature and express them through mathematical equations. Despite the vast diversity of natural phenomena, physical equations exhibit structural regularities that set them apart from arbitrary mathematical expressions. While principles such as dimensional analysis have long guided the formulation of physical models, the exploration of more subtle statistical patterns within the equations of physics remains an open question. Here, by analysing four corpora of physics equations and applying advanced implicit-likelihood techniques, we find that the frequency of mathematical operators follows an exponential decay law, in contrast to Zipf's power law for word frequencies in natural languages. This reveals a statistical meta-law of physics, possibly reflecting a combination of communication efficiency and constraints imposed by Nature itself. The meta-law offers practical benefits for symbolic regression by drastically narrowing down the space of physically plausible expressions. More broadly, it may inform the development of language models that can generate coherent mathematical representations, advancing the automation of physical law discovery.2024-08-12T18:34:57Z11 pages, 5 figures, 2 tablePhilos Trans A Math Phys Eng Sci (2026) 384 (2317): 20250091Andrei ConstantinDeaglan BartlettHarry DesmondPedro G. Ferreira10.1098/rsta.2025.0091http://arxiv.org/abs/2605.05936v1Analysis of Mixed Radiation Fields at the MoEDAL Experiment Based on Real-Time Data from a Timepix Detector Network2026-05-07T09:43:35ZThe primary objective of this work is the determination of fluences and characteristics of fast neutrons, other hadrons, and highly ionizing particles in the environment of the MoEDAL experiment at the Large Hadron Collider. These particles constitute an experimental background for the passive Nuclear Track Detectors (NTDs) used by MoEDAL to search for tracks potentially produced by Dirac magnetic monopoles, in particular by particles indistinguishable in NTD from monopoles. The study is based on data acquired by the Timepix hybrid silicon pixel detector network, which represents the first and only active detector system installed and operated as part of the MoEDAL experiment from 2013 to 2018. The Timepix detector network enables real-time measurements of mixed radiation fields, including the composition, spectral properties, and directional characteristics of individual radiation components across different regions of the MoEDAL experimental area. The paper presents detailed results of the radiation field analysis with emphasis on neutrons and highly ionizing particles, including their directional distributions. The first results demonstrating the spatial tracking capabilities of the Timepix detectors are also reported, illustrating the reconstruction of particle direction and energy-loss profiles from individual detector frames.2026-05-07T09:43:35Z23 pages, 10 figures, submitted to European Physical Journal Special Topics, special issue dedicated to the MoEDAL-MAPP ExperimentBenedikt BergmannPetr BurianJosef JanečekClaude LeroyPetr MánekJames PinfoldStanislav PospíšilRichard SolukMichal Sukhttp://arxiv.org/abs/2605.05679v1Bayesian leave-one-out cross-validation for astrophysical model comparison using gravitational-wave background data2026-05-07T05:13:31ZPrevious work showed that ultralight-dark-matter solitons can provide dynamical friction for supermassive black-hole binaries, suppressing low-frequency power in the pulsar-timing-array gravitational-wave background and constraining the particle mass and effective ultralight-dark-matter fraction. Here we extend that analysis by comparing the predictive performance of four models: simplified and realistic ultralight-dark-matter implementations, a phenomenological environmental-hardening model, and a gravitational-wave-only model. We use Bayesian leave-one-out cross-validation on the five lowest pulsar-timing-array frequency bins. The phenomenological model gives the largest expected log predictive density, but its advantage over the other models is not large compared with the estimated standard errors. The current data therefore do not decisively prefer one model overall. The clearest pairwise result is within the ultralight-dark-matter framework: the simplified model outperforms the realistic implementation in all five frequency bins. Current pulsar-timing-array data are therefore compatible with ultralight-dark-matter-induced low-frequency suppression, but do not yet distinguish ultralight-dark-matter significantly from more generic environmental descriptions of supermassive-black-hole-binary evolution.2026-05-07T05:13:31Z8 pages, 3 figuresShreyas TiruvaskarChris Gordonhttp://arxiv.org/abs/2602.10136v2Collective and nonlinear structure of wind power correlations2026-05-06T08:51:55ZWe describe the correlation structure of wind power fluctuations in a farm of 80 turbines, sampled over 5 years. We report the presence of universal, collective, and nonlinear correlations, responsible for the excess persistency and intermittency of farm-aggregated power output. A first cross-correlation analysis of turbine production reveals a dynamical scaling transition (à la Family-Vicszek) from local decoherence to large-scale turbulence-driven scaling, and responsible for the geographical smoothing effect, previously reported beyond farm scale [M. M. Bandi, Phys. Rev. Lett. 118, 028301 (2017)]. A second bivariate analysis shows the long-range correlation of non-Gaussian features, responsible for their amplification in total farm output. These findings provide a new perspective on wind power variability, highlighting the importance of nonlinear correlations in power production dynamics. By better characterising these fluctuations, our results can inform strategies for grid management, storage optimization, and wind farm design, ultimately improving the integration of wind energy into modern power systems.2026-02-08T03:15:24Z11 pages, 6 figures, supplemental in pdf fileSamy E. LakhalJ. E. SardoniaM. M. Bandihttp://arxiv.org/abs/2605.04218v1Bayesian hypergraph inference from scarce and noisy dynamical observations2026-05-05T19:00:27ZInferring higher-order interaction structure from observations of dynamics is a central challenge in complex systems, particularly when data are scarce, noisy, or concentrated in lower-dimensional regions of state space. We develop Bayes-THIS, a Bayesian extension of Taylor-based Hypergraph Inference using SINDy (THIS), which reconstructs hypergraph structure from time-series data by identifying sparse Taylor coefficients associated with pairwise and higher-order interactions. By replacing fixed-threshold sparse regression with sparse Bayesian regression using automatic relevance determination, Bayes-THIS explicitly models residual variance and applies adaptive, term-wise coefficient shrinkage, improving robustness in data-limited, high-noise, and ill-conditioned regimes. The resulting Gaussian posterior also enables an uncertainty-aware inference workflow: a posterior predictive check assesses whether the data contain sufficient higher-order signal to reliably support inference beyond a pairwise model, and credible-interval pruning selects hyperedges whose inferred coefficients are statistically distinguishable from zero. Finally, we characterize a fundamental limitation of the Taylor-based inference framework: when higher-order interactions concentrate on nodes that lack lower-order connections, the Taylor expansion systematically inflates lower-order coefficient estimates, producing spurious edges indistinguishable from genuine lower-order interactions. This structural non-identifiability cannot be resolved by either THIS or Bayes-THIS.2026-05-05T19:00:27Z16 pages, 8 figuresKaterina TangVivek SrikrishnanJackson Kulik