https://arxiv.org/api/5y2x6qMDT8udjd6/CrvxsLB/VGY 2026-06-18T19:40:14Z 27487 435 15 http://arxiv.org/abs/2605.10331v1 Constraint-aware functional cloning for stable and transferable machine-learned density functional theory 2026-05-11T10:33:06Z

We study a simple but useful test for neural exchange-correlation (XC) functionals: can a neural model reproduce an established XC functional when it is used self-consistently? We call this test functional cloning. The model is trained at the GGA level to reproduce a known semilocal functional, using either a constrained or an unconstrained architecture. The motivation is that an XC functional is not used on a fixed input. In a Kohn-Sham self-consistent-field calculation it contributes to the potential, and the resulting density is part of the outcome of the same calculation. A good pointwise fit to sampled density descriptors is therefore not by itself enough. Because the target functional is known, the error can be measured directly. We compare the clones on sampled descriptors, molecular total energies, energy differences, transfer between PySCF and SIESTA, and equations of state for crystalline solids. The constrained models reproduce the reference functional more accurately in molecular self-consistent calculations. They also give better initial parameters for later optimization against correlated molecular energies. An additional observation is that the constrained architecture already gives a reasonable solid-state baseline before cloning, as seen from randomly initialized constrained models. Clones trained only on molecular densities transfer well to solids, reproducing reference lattice constants and bulk moduli across metallic, covalent, ionic, oxide, and layered systems. Cross-code tests show that energy differences are relatively robust, while total energies depend strongly on whether the cloning descriptors come from all-electron or pseudopotential densities. These results make functional cloning a useful diagnostic before full self-consistent training of neural XC functionals.

2026-05-11T10:33:06Z Sara Navarro-Rodríguez Alec Wills Kimberly J. Daas María Camarasa-Gómez Marivi Fernández-Serra http://arxiv.org/abs/2601.21310v2 A Deterministic Framework for Neural Network Quantum States in Quantum Chemistry 2026-05-11T10:04:57Z

We present a deterministic optimization framework for Neural Network Quantum States (NQS) designed to bypass the sampling variance and slow mixing issues inherent in stochastic optimization. By projecting a neural backflow ansatz onto dynamically evolving configuration subspaces and applying a post-hoc second-order perturbative correction, our method provides a systematic route for optimizing the selected variational component of the wavefunction and estimating residual correlation through a post-hoc perturbative correction. The implementation utilizes a hybrid CPU-GPU architecture that shows empirical sub-linear wall-time scaling with respect to the subspace size over the tested range, enabling the calculation of strongly correlated systems, such as the chromium dimer, within Hilbert spaces of $10^{23}$ configurations. Benchmarks on molecular bond dissociations demonstrate that this deterministic approach yields stable convergence and accuracies comparable to selected reference methods in the tested systems.

2026-01-29T06:13:41Z Zheng Che http://arxiv.org/abs/2605.10266v1 Overfitting by design: neural network density functionals for water 2026-05-11T09:30:32Z

In density functional theory, simpler exchange-correlation (XC) approximations such as the local density approximation (LDA) are favored for computational speed but rely on limited information, leading to a trade-off between accuracy and generality. Machine-learned XC approximations have seen a lot of interest to address this problem. Here, we train a neural network LDA using a differentiable Kohn-Sham solver, imparting system-specific expertise for water and sacrificing generality for accuracy. Our model achieves 1 kcal / mol errors on gold standard coupled cluster ionization and atomization energies, and improves predictions of spectral lines, electron density distribution, and equilibrium geometry from as few as eight configurations used for training. We proceed to perform transfer learning and obtain results comparable to higher-rung PBE and B3LYP functionals on the WATER27 subset of the GMTKN55 database, even when only a single two-molecule binding energy is used in the transfer process. This result opens the door for specialist functionals to be trained on different systems from little data, enhancing predictions while maintaining low training costs. Our approach of training a modified XC density functional approximation (DFA) furthermore allows for a highly interpretable result, as the neural network directly corresponds to a correction of the XC energy per electron.

2026-05-11T09:30:32Z 14 pages excl. references, 7 figures. Supplemental material included as separate PDF Karim K. Alaa El-Din Antonius v. Strachwitz Ana Coutinho Dutra Sam M. Vinko http://arxiv.org/abs/2605.10209v1 Analytical Representation for the Electronic Contribution of the Nuclear Schiff Interaction Hamiltonian 2026-05-11T08:54:18Z

The nuclear Schiff interaction (NSI) arises from a nuclear force that simultaneously violates spatial parity (P) and time reversal (T) symmetries, where T symmetry is equivalent to CP symmetry under CPT invariance. Detecting the NSI experimentally is important because CP violation is critical for explaining why the amount of matter in the Universe is far greater than that of antimatter. Measuring the NSI in molecules requires both precise experiments and theoretical calculations that incorporate electronic and nuclear wavefunctions. Conventionally, the electronic terms have been approximated using a first-order power series expansion of the electronic radial function-an approach that yields the well-known nuclear Schiff moment (NSM) -but this approximation may not be sufficiently accurate. In this study, we introduce a new, accurate analytical expression for the electronic terms based on Gaussian basis sets, which avoids any truncation of the power series. We find that the previous numerical approach overestimates the values for RaO and LrF by more than 50% and 300%, respectively, in the nuclear-radius region. In contrast to the numerical calculations, the analytical expression-based calculations show less sensitivity to choice of the basis-functions. Furthermore, we develop a new basis set that describes accurate behavior of wave functions both interior and exterior regions of nucleus. It also demonstrates that an even-tempered basis set is more preferrable over energy optimized basis set for calculating the NSI electronic term in molecules.

2026-05-11T08:54:18Z Satoshi Toda Yasuto Masuda Naohiro Tomiyama Kota Yanase Bijaya Kumar Sahoo Masahiko Hada Minori Abe http://arxiv.org/abs/2605.10132v1 Chiral Porphyrin Monolayers on Ferromagnetic Thin Films: Ultrafast Spectroscopy of Hybrid Interfaces 2026-05-11T07:44:35Z

Hybrid ferromagnetic metal/organic interfaces (spinterfaces) exhibit unique properties, including spin filtering. In parallel, chiral organic molecules can themselves induce efficient spin filtering, leading to unexpectedly high spin polarizations. Here, we investigate how the proximity of gold-capped Co/Ni ferromagnetic multilayers influences the spectroscopic properties and photoinduced electron dynamics of chiral oligopeptides bearing a porphyrin chromophore. The molecules are covalently attached to the gold cap via a chiral linker, forming a self-assembled monolayer. The porphyrin macrocycles adopt an orientation parallel to the surface, resulting in the formation of J-like aggregates. Photoinduced dynamics are probed using femtosecond pump-probe transient absorption spectroscopy. Despite excitation of only a single molecular layer, a clear transient absorption signal of the porphyrin singlet excited state is observed. Adsorption on the metal surface leads to a pronounced reduction of the excited-state lifetime. However, no signatures of long-lived photoinduced charge-transfer products are detected. Furthermore, no dependence of the excited-state dynamics on either the magnetization direction of the ferromagnetic layer or the molecular chirality is observed.

2026-05-11T07:44:35Z Karol Hauza Anna Lewandowska-Andralojc Ruslan Salikhov Jürgen Lindner Gotard Burdzinski Marcin Kwit Bronislaw Marciniak Aleksandra Lindner http://arxiv.org/abs/2605.09978v1 Collective resonance light scattering from thermally relaxing systems in cavities 2026-05-11T04:40:15Z

We study steady-state resonance light scattering from ensembles of noninteracting molecules, both in free space and inside optical cavities, while accounting for local thermal relaxation. The scattering spectra are obtained from steady-state solutions of either the Schrödinger equation or a Liouville-space master equation. In the absence of a cavity, the spectra exhibit an elastic peak at the incident-photon energy and an inelastic fluorescence peak near the molecular excitation energy. Inside a cavity, the fluorescence peak splits into upper- and lower-polaritonic peaks in the strong-coupling regime. We analyze how the elastic and inelastic spectral features scale with the number of molecules under fixed cavity-molecule coupling and identify distinct collective trends in the Rayleigh peak intensity and in the integrated polaritonic or fluorescence spectral weight. The two theoretical approaches yield qualitatively consistent results while highlighting different aspects of thermally induced relaxation and dephasing.

2026-05-11T04:40:15Z Bingyu Cui http://arxiv.org/abs/2412.04442v3 Linear-Scaling Potential-Free Data-Driven Molecular Dynamics for Arbitrary-Sized Water Clusters $(\text{H}_2\text{O})_n$ 2026-05-11T03:52:25Z

Conventional molecular dynamics (MD) simulation approaches, such as $\textit{ab initio}$ MD (AIMD) and empirical force field MD (EFFMD), face significant trade-offs between physical accuracy and computational efficiency. This work presents a linear-scaling potential-free data-driven molecular dynamics (PDMD) framework for predicting system energy and atomic forces of arbitrary-sized water clusters $(\text{H}_2\text{O})_n$. Specifically, PDMD employs a Gaussian-based atomic geometry descriptor to generate high-dimensional, equivariant features, then leverages ChemGNN, a graph neural network model that adaptively learns the atomic chemical environments without requiring $\textit{a priori}$ knowledge. Through an iterative self-consistent training approach, the converged PDMD achieves a mean absolute error of 1.39 meV/atom for energy and 50.7 meV/angstrom for forces, outperforming the state-of-the-art DeepMD by $\sim$5x in energy accuracy and $\sim$3x in force accuracy. As a result, the linear-scaling PDMD can reproduce the AIMD properties of water clusters at orders-of-magnitude lower computational cost, as illustrated by simulations of systems consisting of thousands or more molecules. These results demonstrate that the proposed PDMD offers multiphase predictive power and enables ultra-fast, general-purpose MD simulations while retaining AIMD-level accuracy. This accuracy is achieved by efficiently capturing many-body potentials that are critical in numerous polyatomic systems but are often missing in EFFMD. Moreover, we have constructed an $\textit{ab initio}$ dataset with over 300,000 $(\text{H}_2\text{O})_n$ structures, standardized in a unified PyTorch Geometric framework, to support scalable evaluation of artificial intelligence methods for molecular dynamics.

2024-12-05T18:56:26Z Hongyu Yan Yong Wei Minghan Chen Hanning Chen http://arxiv.org/abs/2404.17756v3 Origins of suppressed self-diffusion of nanoscale constituents of a complex liquid 2026-05-10T23:22:42Z

Understanding and ultimately controlling the transformations and properties of nanoscale systems, from proteins to synthetic nanomaterial assemblies, is limited by the inability to uncover their dynamics on their characteristic length and time scales. Here, we nevertheless demonstrate this ability using MHz X-ray photon correlation spectroscopy (XPCS) -- directly elucidating the characteristic microsecond-dynamics of density fluctuations of semiconductor nanocrystals (NCs), not only in a colloidal dispersion but also in a liquid phase consisting of densely packed, yet mobile, NCs with no long-range order. We find the wavevector-dependent fluctuation rates in the liquid phase are suppressed relative to those in the colloidal phase and relative to observations of densely packed repulsive particles. We show that the suppressed rates are due to a substantial decrease in the self-diffusion of NCs, which we attribute to explicit attractive interactions. Using coarse-grained simulations, we find that the extracted shape and strength of the interparticle potential explains the stability of the liquid phase, in contrast to the gelation observed via XPCS in many other charged colloidal systems. This work opens the door to elucidating fast, condensed phase dynamics in complex fluids and other nanoscale soft matter, such as densely packed proteins and non-equilibrium self-assembly processes, in addition to designing microscopic strategies to avert gelation.

2024-04-27T02:14:32Z 7 pages, 4 figures Christian P. N. Tanner Vivian R. K. Wall Mumtaz Gababa Joshua Portner Ahhyun Jeong Matthew J. Hurley Nicholas Leonard Jonathan G. Raybin James K. Utterback Ahyoung Kim Andrei Fluerasu Yanwen Sun Johannes Moeller Alexey Zozulya Wonhyuk Jo Felix Brausse James Wrigley Ulrike Boesenberg Jan-Etienne Pudell Joerg Hallmann Wei Lu Roman Shayduk Mohamed Youssef Anders Madsen David T. Limmer Dmitri V. Talapin Samuel W. Teitelbaum Naomi S. Ginsberg http://arxiv.org/abs/2605.09752v1 Polarizable Embedding QM/MM for Periodic Systems 2026-05-10T21:00:21Z

A general polarizable embedded (PE) quantum mechanics/molecular mechanics scheme for periodic systems is presented, describing mutual polarization of the two subsystems. The QM system, described with density functional theory (DFT), is coupled to a single center multipole expansion (SCME) model, characterising H$_2$O molecules in the MM region. In SCME the H$_2$O molecules are ascribed anisotropic dipole and quadrupole polarizabilities and permanent multipoles up to and including the hexadecapole. Our embedding scheme illustrates a smooth and efficient convergence pattern of the periodic interaction potential by introducing a single and clustered multipole expansion points in the far-field. By choosing the near- and far-field expansion of the potential carefully the PE-QM/MM calculation matches the level of accuracy of a the QM calculation. In the short range, the electrostatic interaction between the QM and MM subsystems is damped with a real-space and pair-wise isotropic damping functions - resulting in a screened interaction and preventing over-polarization. In molecular dynamics simulations the two subsystems are separated with the elastic scattering assisted flexible inner region [Kirchhoff et. al. JCTC, 2021, 17, 9, 5863] - ensuring a smooth transition in the radial distribution at the boundary between the two subsystems.

2026-05-10T21:00:21Z 13 pages, 8 figures. SI 28 pages, 7 figures Julian Bessner Anoop Ajaya Kumar Nair Magnus Andreas Hilduberg Christiansen Timo Jacob Hannes Jónsson Elvar Örn Jónsson http://arxiv.org/abs/2605.09495v1 Enabling Structure-Only Initialization and Out-of-Distribution Generalization in GNN-based Molecular Dynamics Simulators 2026-05-10T12:00:21Z

Machine learning-based simulators offer the potential to model the dynamics of complex systems more efficiently than classical approaches, while retaining differentiability, a key property for materials design. Graph neural network (GNN)-based simulators have shown strong performance across a range of physical domains, including molecular dynamics. However, their reliance on temporal context for accurate prediction limits their use in inverse design settings, where simulations must be initialized from a single static configuration. Moreover, inverse design requires robust out-of-distribution (OOD) generalization, as candidate structures typically lie outside the training domain. Here, we address both challenges by introducing two complementary strategies that enable stable and accurate structure-only initialization of GNN-based simulations. To directly target OOD generalization, we propose an inference-time physics-based optimization framework that constrains model predictions to remain physically consistent during rollout. In addition, we introduce a differentiable, GNN-based barostat that enables accurate tracking of system dimensions and pressure, critical for capturing macroscopic responses and supporting OOD generalization. We evaluate these approaches in the context of uniaxial compression of disordered elastic networks spanning a broad range of geometries, Poisson ratios, and microscopic behaviors. We find that, together, these methods substantially improve rollout stability and enable reliable OOD generalization, including regimes with distinct, more complex dynamics than those in the training data. These results show that, when properly initialized and constrained, GNN-based simulators can serve as efficient and generalizable tools for materials discovery and structural optimization, advancing their use in materials, molecular, and dynamical system design.

2026-05-10T12:00:21Z 10 pages, 7 figures S. A. Shteingolts Salman N. Salman Dan Mendels http://arxiv.org/abs/2505.07027v2 LLM-Augmented Chemical Synthesis and Design Decision Programs 2026-05-10T08:27:52Z

Retrosynthesis, the process of breaking down a target molecule into simpler precursors through a series of valid reactions, stands at the core of organic chemistry and drug development. Although recent machine learning (ML) research has advanced single-step retrosynthetic modeling and subsequent route searches, these solutions remain restricted by the extensive combinatorial space of possible pathways. Concurrently, large language models (LLMs) have exhibited remarkable chemical knowledge, hinting at their potential to tackle complex decision-making tasks in chemistry. In this work, we explore whether LLMs can successfully navigate the highly constrained, multi-step retrosynthesis planning problem. We introduce an efficient scheme for encoding reaction pathways and present a new route-level search strategy, moving beyond the conventional step-by-step reactant prediction. Through comprehensive evaluations, we show that our LLM-augmented approach excels at retrosynthesis planning and extends naturally to the broader challenge of synthesizable molecular design.

2025-05-11T15:43:00Z ICML 2025 Haorui Wang Jeff Guo Lingkai Kong Rampi Ramprasad Philippe Schwaller Yuanqi Du Chao Zhang http://arxiv.org/abs/2605.09394v1 Systematic Fine-Tuning of MACE Interatomic Potentials for Catalysis 2026-05-10T07:43:42Z

Once trained, machine-learned interatomic potentials (MLIPs) provide a fast and accurate way to study catalytic reaction pathways, but their performance strongly depends on the training set. Here, we compare nine MLIPs trained with different data sets and strategies, including from-scratch (FS) training and fine-tuning (FT) of large foundation models. The models are evaluated on reaction energies, $E_{r}$, and reaction energy barriers, $E_{a}$, for 141 reactions, including CO$_2$ reduction to C$_2$ and C$_3$ products, propane dehydrogenation, hydrogen intercalation on Pd, and out-of-distribution oxygen evolution reaction (OER) on metal oxides. FS models trained with 5%--10% perturbed high-energy configurations from molecular dynamics or contour exploration reduce the error by more than twofold compared with models trained only on relaxation trajectories. In contrast, FT MLIPs are less sensitive to sampling and transfer well to out-of-distribution reactions. An MLIP fine-tuned on metallic catalysts achieves a 0.30 eV MAE for OER on iridium oxide polymorphs, outperforming out-of-the-box MACE-MH-1 by 0.08 eV and the best FS model by 0.14 eV. A model fine-tuned to O and OH adsorption on metal oxides gives a 0.19 eV reaction-barrier MAE for out-of-distribution CO$_2$RR on Cu, comparable to an FS model trained on in-distribution C--C bond-breaking reactions. Finally, a large MLIP fine-tuned on 49,860 configurations gives the best overall performance across metallic and metal-oxide catalysts and was used to screen a large left-out set of bimetallic alloys, achieving a 0.15 eV MAE for $E_{r}$, even for adsorbates on unseen Miller-index surfaces such as (532). This work identifies the training configurations needed for accurate FS and FT MLIPs for catalytic reaction modeling.

2026-05-10T07:43:42Z Nima Karimitari Jacob Clary Derek Vigil-Fowler Ravishankar Sundararaman Gábor Csányi Christopher Sutton http://arxiv.org/abs/2605.09311v1 Teaching Molecular Dynamics to a Non-Autoregressive Ionic Transport Predictor 2026-05-10T04:09:01Z

Unlike most static material properties widely studied in the machine learning literature, ionic transport properties are inherently dynamic, making their fast and accurate prediction from static atomic structures challenging. The current standard approach, molecular dynamics (MD) simulations, suffers from prohibitively high computational cost. Recent autoregressive learning-based MD acceleration methods requiring sequential inference remain slow and prone to error accumulation; in contrast, existing non-autoregressive material property prediction models are less accurate because they fail to exploit dynamics. Moreover, existing methods typically benefit from datasets either with or without atomic trajectories, but not both. To overcome these limitations, we propose a non-autoregressive learning framework based on auxiliary modality learning, which treats atomic trajectories as an auxiliary modality during training but does not require them at inference. This enables the predictor to learn dynamics without sequential inference while benefiting from both types of datasets. As a result, our framework achieves over 200 times speedup compared to autoregressive models on the dataset with atomic trajectories while substantially reducing prediction error relative to non-autoregressive benchmarks across both types of datasets. Our code is available at https://github.com/jykim-git/MD.

2026-05-10T04:09:01Z International Conference on Machine Learning (ICML 2026) (to appear) (Please cite our conference version.) Jiyeon Kim Byungju Lee Won-Yong Shin http://arxiv.org/abs/2605.08994v1 Beyond the Black Box: An Interpretable Machine Learning Framework for Predicting Electronic Structure Microdescriptors and Structure-Performance Relationships in Fe-based Catalytic Systems 2026-05-09T15:25:30Z

The current catalyst discovery and development pipeline for energy-intensive applications like methane conversion remains bottlenecked by expensive trial-and-error experimentation, irreproducible chemical intuition, and a lack of frameworks linking complex catalytic design spaces to performance. This work presents an interpretable machine learning framework that integrates SHAP-based feature importance analysis (Explainable AI) with tree-based ensembles (Random Forest and Bayesian-optimized CatBoost) to characterize Fe-zeolite and oxide-supported catalysts for the partial oxidation of methane (POM). Despite limited data, the framework decodes complex structure-performance relationships by identifying and ranking thermodynamic, structural, and geometric microdescriptors that influence the electronic band gap and govern macroscale performance metrics such as selectivity, activity, and stability. This work explicitly demonstrates that thermodynamic lattice stability and geometric factors are the primary drivers of electronic band gap (a critical proxy for redox reactivity) rather than bulk stoichiometry. Non-linear models achieve an R2 of 0.61 - 0.77, significantly outperforming traditional linear baselines (R2 = 0.32). This workflow provides both a light-weight generalizable methodology and a prioritized list of physical features for accelerated catalyst screening - and these features can subsequently be integrated into microkinetic and reaction engineering models to create digital twins of complex reactor systems and to enable predictive optimization in autonomous R&D laboratories.

2026-05-09T15:25:30Z 27 pages, 10 figures Oyinkansola Romiluyi http://arxiv.org/abs/2605.08960v1 CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models 2026-05-09T13:56:23Z

Crystal generative models mainly learn what stable crystals look like, with little explicit supervision for what makes them stable. We reveal a substantial representation gap between state-of-the-art crystal generative models and pretrained universal machine learning interatomic potentials (MLIPs) via energy probing, and show this gap can be closed by a simple training-time alignment. We propose Crystal REPresentation Alignment (CrystalREPA), a plug-and-play framework that aligns the atom-wise hidden states of generative encoders with frozen MLIP representations through an element-aware contrastive objective, transferring stability-aware atomistic priors with marginal training overhead and no additional inference cost. Across three generative frameworks, ten MLIP teachers, and two benchmark datasets, CrystalREPA consistently improves the thermodynamic stability, structural validity, and structural fidelity of generated crystals. Equally important, we find that an MLIP's transfer effectiveness is poorly predicted by its accuracy on standard leaderboards (e.g., Matbench Discovery) but strongly predicted by the distinguishability of its atom-wise representation space, yielding a practical, accuracy-independent criterion for selecting MLIP teachers for generative transfer.

2026-05-09T13:56:23Z Chengqian Zhang Yucheng Jin Duo Zhang Tiejun Li Han Wang