https://arxiv.org/api/ucZDXkOD2dtZS+hIxx+fAoB0sKQ 2026-03-22T14:43:47Z 26775 45 15 http://arxiv.org/abs/2601.00581v2 AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules 2026-03-16T15:31:48Z We introduce AceFF, a pre-trained machine learning interatomic potential (MLIP) optimized for small molecule drug discovery. While MLIPs have emerged as efficient alternatives to Density Functional Theory (DFT), generalizability across diverse chemical spaces remains difficult. AceFF addresses this via a refined TensorNet2 architecture trained on a comprehensive dataset of drug-like compounds. This approach yields a force field that balances high-throughput inference speed with DFT-level accuracy. \mbox{AceFF} fully supports the essential medicinal chemistry elements (H, B, C, N, O, F, Si, P, S, Cl, Br, I) and is explicitly trained to handle charged states. Validation against rigorous benchmarks, including complex torsional energy scans, molecular dynamics trajectories, batched minimizations, and tests of force and energy accuracy, demonstrates that AceFF is state-of-the-art for organic molecules in the accuracy and speed regime important for drug discovery. The AceFF-2 model weights and inference code are available at https://huggingface.co/Acellera/AceFF-2.0. 2026-01-02T05:47:37Z Stephen E. Farr Stefan Doerr Antonio Mirarchi Francesc Sabanes Zariquiey Gianni De Fabritiis http://arxiv.org/abs/2603.15307v1 A Kolmogorov-Arnold Surrogate Model for Chemical Equilibria: Application to Solid Solutions 2026-03-16T14:04:53Z The computational cost of geochemical solvers is a challenging matter. For reactive transport simulations, where chemical calculations are performed up to billions of times, it is crucial to reduce the total computational time. Existing publications have explored various machine-learning approaches to determine the most effective data-driven surrogate model. In particular, multilayer perceptrons are widely employed due to their ability to recognize nonlinear relationships. In this work, we focus on the recent Kolmogorov-Arnold networks, where learnable spline-based functions replace classical fixed activation functions. This architecture has achieved higher accuracy with fewer trainable parameters and has become increasingly popular for solving partial differential equations. First, we train a surrogate model based on an existing cement system benchmark. Then, we move to an application case for the geological disposal of nuclear waste, i.e., the determination of radionuclide-bearing solids solubilities. To the best of our knowledge, this work is the first to investigate co-precipitation with radionuclide incorporation using data-driven surrogate models, considering increasing levels of thermodynamic complexity from simple mechanical mixtures to non-ideal solid solutions of binary (Ba,Ra)SO$_4$ and ternary (Sr,Ba,Ra)SO$_4$ systems. On the cement benchmark, we demonstrate that the Kolmogorov-Arnold architecture outperforms multilayer perceptrons in both absolute and relative error metrics, reducing them by 62% and 59%, respectively. On the binary and ternary radium solid solution models, Kolmogorov-Arnold networks maintain median prediction errors near $1\times10^{-3}$. This is the first step toward employing surrogate models to speed up reactive transport simulations and optimize the safety assessment of deep geological waste repositories. 2026-03-16T14:04:53Z Leonardo Boledi Dirk Bosbach Jenna Poonoosamy http://arxiv.org/abs/2603.10992v2 Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches 2026-03-16T04:02:24Z Accelerating the explorations of stationary points on potential energy surfaces building local surrogates spans decades of effort. Done correctly, surrogates reduce required evaluations by an order of magnitude while preserving the accuracy of the underlying theory. We present a unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion. The framework uses Gaussian process regression with derivative observations, inverse-distance kernels, and active learning. The Optimal Transport GP extensions of farthest point sampling with Earth mover's distance, MAP regularization via variance barrier and oscillation detection, and adaptive trust radius form concrete extensions of the same basic methodology, improving accuracy and efficiency. We also demonstrate random Fourier features decouple hyperparameter training from predictions enabling favorable scaling for high-dimensional systems. Accompanying pedagogical Rust code demonstrates that all applications use the exact same Bayesian optimization loop, bridging the gap between theoretical formulation and practical execution. 2026-03-11T17:20:28Z 60 pages, 24 figures. Invited article for ACS Physical Chemistry Au Rohit Goswami Institute IMX and Lab-COSMO, École polytechnique fédérale de Lausanne http://arxiv.org/abs/2603.14737v1 Reproducible Orchestration of Best Practices for Reaction Path Optimization with the Nudged Elastic Band 2026-03-16T02:22:17Z The nudged elastic band (NEB) method is the standard approach for finding minimum energy paths and transition states on potential energy surfaces. Practical NEB calculations require several pre-processing steps: endpoint minimization, structural alignment, and initial path generation. These steps are typically handled by ad-hoc scripts or manual intervention, introducing errors and hindering reproducibility. We present a fully automated, open-source Snakemake workflow for small gas phase molecules that couples modern machine learning potentials (PET-MAD) to the eOn saddle point search software. Each step of the calculation lifecycle is encoded as an explicit dependency graph, from model retrieval and endpoint preparation through path initialization and band optimization. The workflow resolves all software dependencies from conda-forge, ensuring identical execution across platforms. Validation on the HCN to HNC isomerization demonstrates that the automated pipeline recovers the known single-barrier energy profile and product energy without manual intervention. 2026-03-16T02:22:17Z 13 pages, 6 figures Rohit Goswami Institute IMX and Lab-COSMO, École polytechnique fédérale de Lausanne http://arxiv.org/abs/2603.14700v1 Design Space of Self--Consistent Electrostatic Machine Learning Interatomic Potentials 2026-03-16T01:16:08Z Machine learning interatomic potentials (MLIPs) have become widely used tools in atomistic simulations. For much of the history of this field, the most commonly employed architectures were based on short-ranged atomic energy contributions, and the assumption of locality still persists in many modern foundation models. While this approach has enabled efficient and accurate modelling for many use cases, it poses intrinsic limitations for systems where long-range electrostatics, charge transfer, or induced polarization play a central role. A growing body of work has proposed extensions that incorporate electrostatic effects, ranging from locally predicted atomic charges to self-consistent models. While these models have demonstrated success for specific examples, their underlying assumptions, and fundamental limitations are not yet well understood. In this work, we present a framework for treating electrostatics in MLIPs by viewing existing models as coarse-grained approximations to density functional theory (DFT). This perspective makes explicit the approximations involved, clarifies the physical meaning of the learned quantities, and reveals connections and equivalences between several previously proposed models. Using this formalism, we identify key design choices that define a broader design space of self-consistent electrostatic MLIPs. We implement salient points in this space using the MACE architecture and a shared representation of the charge density, enabling controlled comparisons between different approaches. Finally, we evaluate these models on two instructive test cases: metal-water interfaces, which probe the contrasting electrostatic response of conducting and insulating systems, and charged vacancies in silicon dioxide. Our results highlight the limitations of existing approaches and demonstrate how more expressive self-consistent models are needed to resolve failures. 2026-03-16T01:16:08Z William J. Baldwin Ilyes Batatia Martin Vondrák Johannes T. Margraf Gábor Csányi http://arxiv.org/abs/2603.14677v1 Stochastic Collision Theory of Magnetism in Radical Fluids 2026-03-16T00:24:26Z How stochastic, microscopic events generate deterministic, macroscopic properties is a fundamental question in physics. We address this question by developing a quantum master equation model for concentrated radical solutions, where random molecular collisions govern the magnetic properties of the system. Our theory reveals a simple mechanism: the first-order exchange contribution averages to zero over collisions, while the second-order term survives as an effective ferromagnetic coupling that enhances magnetization. The model captures the experimentally observed trends in magnetic behavior that deviate from conventional theories. Because the mechanism arises from statistical averaging, it may apply to a broader class of soft matter phenomena, including liquid crystals. 2026-03-16T00:24:26Z 5 pages, 4 figures Yoshiaki Uchida Ryohei Kishi http://arxiv.org/abs/2603.14675v1 Acrylamide Conformers: A Revision of Published Density Functional Theory Studies 2026-03-16T00:10:04Z Acrylamide, with PubChem identifier CID=6579 is broadcasted to have four stable conformers contrasting with several journal publications characterizing only two or three. In this revision summary the discrepancy is clarified. Through very high precision density functional theory (DFT) calculations, three stable conformers and the three transition state barriers existing between them are verified to exist and validated with our own DFT calculations The most stable conformer is a planar molecular structure termed "sys" or "trans" in the literature. Meanwhile, a less stable structure termed "skew" pertains to two 3-dimensional structures that are energy-degenerate, but differ in their structure for being mirrored images of each other. Vibrational spectra, partial atomic charges, Cartesian coordinates, and Intrinsic Reaction Coordinate paths are summarized and recalculated with DFT at the wB97XD/Def2TZVPP level for the three stable acrylamide isomers: the sys/trans lowest in energy structure, and the two skew mirrored structures. 2026-03-16T00:10:04Z 5 pages, 2 figures, 4 tables William Scott Estela Blaisten-Barojas http://arxiv.org/abs/2505.08195v4 Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations 2026-03-16T00:08:00Z We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations, analyzing simulation results, and summarizing them for the user in both textual and graphical forms. Aitomia combines LLM-based agents with the MLatom platform to support AI-driven atomistic simulations as well as conventional quantum-chemical calculations, including DFT, semiempirical methods such as GFN2-xTB, and selected high-level wavefunction-based methods, through interfaces to widely used programs such as Gaussian, ORCA, PySCF, and xtb, covering tasks from ground-state and excited-state calculations to geometry optimization, thermochemistry, and spectra simulations. The multi-agent implementation enables autonomous execution of complex computational workflows, such as reaction enthalpy calculations. Aitomia was the first intelligent assistant publicly launched on cloud computing platforms for broad-scope atomistic simulations (Aitomistic Lab@XMU at https://atom.xmu.edu.cn and Aitomistic Hub at https://aitomistic.xyz). Aitomia lowers the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields. 2025-05-13T03:11:41Z Jinming Hu Hassan Nawaz Yi-Fan Hou Yuting Rui Lijie Chi Yuxinxin Chen Arif Ullah Pavlo O. Dral http://arxiv.org/abs/2603.14515v1 Excited Pfaffians: Generalized Neural Wave Functions Across Structure and State 2026-03-15T17:51:16Z Neural-network wave functions in Variational Monte Carlo (VMC) have achieved great success in accurately representing both ground and excited states. However, achieving sufficient numerical accuracy in state overlaps requires increasing the number of Monte Carlo samples, and consequently the computational cost, with the number of states. We present a nearly constant sample-size approach, Multi-State Importance Sampling (MSIS), that leverages samples from all states to estimate pairwise overlap. To efficiently evaluate all states for all samples, we introduce Excited Pfaffians. Inspired by Hartree-Fock, this architecture represents many states within a single neural network. Excited Pfaffians also serve as generalized wave functions, allowing a single model to represent multi-state potential energy surfaces. On the carbon dimer, we match the $O(N_s^4)$-scaling natural excited states while training $>200\times$ faster and modeling 50\% more states. Our favorable scaling enables us to be the first to use neural networks to find all distinct energy levels of the beryllium atom. Finally, we demonstrate that a single wave function can represent excited states across various molecules. 2026-03-15T17:51:16Z Nicholas Gao Till Grutschus Frank Noé Stephan Günnemann http://arxiv.org/abs/2603.14466v1 Explicit, Machine-Learned Two-Body Potentials for Molecular Simulations 2026-03-15T16:15:04Z A new pairwise hybrid machine-learning/molecular mechanics (ML/MM) potential is introduced that is conceived for application to large, heterogeneous condensed-phase systems. The PhysNet ML method describes monomers and short-range dimer interactions, while a classical MM force field describes pairwise interactions beyond a defined switching distance. Models are fitted to MP2 dimer and pairwise cluster energies, and the quality of each model is assessed at different switching distances and using MM approaches with and without detailed distributed charge electrostatics. The applicability of the approach to molecular dynamics simulations is demonstrated for a basic implementation applied to a small model system. Dichloromethane and acetone are used as test systems to demonstrate the accuracy of the approach in describing pairwise reference data, and also to highlight the limitations of the pairwise approach for systems that exhibit significant many-body effects in condensed phase, paving the way for the addition of a general many-body correction in future work. 2026-03-15T16:15:04Z Kham Lek Chaton Eric D. Boittier Mike Devereux Markus Meuwly http://arxiv.org/abs/2603.15686v1 Life cycle assessment for all organic chemicals 2026-03-15T15:53:29Z Chemicals are embedded in nearly every aspect of modern society, yet their production poses substantial sustainability concerns. Achieving a sustainable chemical industry requires detailed Life Cycle Assessment (LCA); however, current assessments face many unknowns due to limited, partly inconsistent, and untransparent data coverage since existing Life Cycle Inventory (LCI) databases account for only a tiny fraction of traded chemicals. Here, we introduce the Chemical RetrosYnthesiS for Transparent Assessment of Life-cycles (CRYSTAL) framework, which automatically generates consistent and transparent LCI data for organic chemicals based on their molecular structure using retrosynthesis and machine-learned gate-to-gate inventories. Using the predictive power of CRYSTAL, we create a consistent database for more than 70000 organic chemicals, comprising over 110000 transparent LCI datasets that quantify both feedstock and energy demands, together with associated auxiliary materials, biosphere flows, and waste flows. From this comprehensive database, we identify 50 key environmental hotspots driving high impacts of organic chemical production across multiple environmental categories and pivotal hub chemicals that are most critical for downstream chemical production. In providing this comprehensive data foundation, the CRYSTAL framework offers systematic guidance for targeted engineering and policy interventions. Its transparent, modular nature is designed to shift chemical LCA from a reliance on "unknown unknowns" to a collaboratively improvable mapping of "known unknowns". 2026-03-15T15:53:29Z 24 pages, 9 figures Shaohan Chen Tim Langhorst Julian Nöhl Christopher Oberschelp Martin Pillich Johannes Schilling André Bardow http://arxiv.org/abs/2603.14414v1 Auto-WHATMD : Automated Wasserstein-based High-dimensional feature extraction Analysis of Trajectories from Molecular Dynamics 2026-03-15T14:57:52Z Comparing multiple protein systems with variation such as different binding ligands or mutations, and understanding their effects is one of the objectives in molecular dynamics simulations. Representation of these systems by a few features enables quantitative comparison. However, because molecular dynamics simulation trajectories are high-dimensional spatiotemporal data, selection of key features relies on domain expertise, sometimes introducing arbitrary assumptions. Here, we present an approach that uses the optimal transport distance to compare high-dimensional trajectory data, and employs simulated annealing to identify the residues that best distinguish multiple systems. We term this algorithm auto-WHATMD (automated Wasserstein-based High-dimensional feature extraction Analysis for Trajectories of Molecular Dynamics). We applied auto-WHATMD to multiple protein-ligand systems of bromodomain 4 with different ligands, identifying the most discriminative residues in the loop region. Moreover, even a few selected residues were sufficient to capture the correlation with ligand-binding affinities, indicating that auto-WHATMD effectively prioritizes the most informative residues. Our approach can be used to efficiently determine key residues and design features for multiple analogous systems. 2026-03-15T14:57:52Z Sosuke Asano Ikki Yasuda Katsuhiro Endo Yoshinori Hirano Kenji Yasuoka http://arxiv.org/abs/2603.14314v1 Carbon black and hydrogen production from methane pyrolysis: measured and modeled insights from integrated gas and particle diagnostics in shock tubes 2026-03-15T10:16:09Z Methane (CH4) pyrolysis is a promising route to co-produce hydrogen (H2) and carbon black (CB) while avoiding emissions associated with steam-methane reforming and furnace black processes. Model development of pyrolytic CB synthesis requires experimental observations of concurrent gas chemistry, particulate formation, and morphology. This work presents a combined experimental and modeling study of CH4 pyrolysis behind reflected shock waves in 5% CH4/Argon mixtures at post-reflected shock temperatures (T5) of 1850-2450 K and P5 around 4.5 atm. Laser absorption diagnostics quantified CH4, C2H4, and C2H2 mole fractions, while multiwavelength extinction (633 and 1064 nm) resolved time-dependent particle formation and the temperature-dependent evolution of optical maturity. Simulations reproduce small-molecule speciation well, but large variations in predicted polycyclic aromatic hydrocarbons (PAHs) persist among models. Coupled gas-particle simulations capture accurate volume fraction (fv) trends and the influence of gas dynamics but underpredict induction times at high T5. Samples collected at the shock tube endwall were analyzed by transmission electron microscopy (TEM) to quantify primary particle size distributions and nanostructure arrangement. Image segmentation and manual measurements showed reduced primary particle size growth (dp) with increasing T5, while graphitic nanostructure generally increased. This study provides an integrated benchmark for improving models of CB and H2 production from CH4 pyrolysis by constraining gas-phase kinetics, PAH-driven inception, particle dynamics, and particle maturity. The results highlight that accurate partitioning of mass between particle number and particle size is an important constraint for further model development. 2026-03-15T10:16:09Z 18 pages, 18 figures, 1 table. Includes Supplementary Material. Submitted to Carbon Gibson Clark Mohammad Adib Chengze Li Taylor M. Rault Jesse W. Streicher Enoch Dames M. Reza Kholghy Ronald K. Hanson http://arxiv.org/abs/2507.08442v3 Extending Nonlocal Kinetic Energy Density Functionals to Isolated Systems via a Density-Functional-Dependent Kernel 2026-03-15T06:42:02Z The Wang-Teter-like nonlocal kinetic energy density functional (KEDF) in the framework of orbital-free density functional theory, while successful in some bulk systems, exhibits a critical Blanc-Cances instability [J. Chem. Phys. 122, 214106 (2005)] when applied to isolated systems, where the total energy becomes unbounded from below. We trace this instability to the use of an ill-defined average charge density, which causes the functional to simultaneously violate the scaling law and the positivity of the Pauli energy. By rigorously constructing a density-functional-dependent kernel, we resolve these pathologies while preserving the formal exactness of the original framework. By systematically benchmarking single-atom systems of 56 elements, we find the resulting KEDF retains computational efficiency while achieving an order-of-magnitude accuracy enhancement over the WT KEDF. In addition, the new KEDF preserves WT's superior accuracy in bulk metals, outperforming the semilocal functionals in both regimes. 2025-07-11T09:34:34Z 7 pages, 6 figures Liang Sun Mohan Chen http://arxiv.org/abs/2603.14155v1 The Python Simulations of Chemistry Framework: 10 years of an open-source quantum chemistry project 2026-03-14T23:42:43Z Over the past decade, the Python-based Simulations of Chemistry Framework (PySCF) has developed into a widely used open-source platform for electronic structure theory and quantum chemical method development. This article reviews the major advances since the previous overview in 2020, covering new modules and methodology, infrastructure changes, and performance benchmarks. 2026-03-14T23:42:43Z Qiming Sun Matthew R Hermes Xiaojie Wu Huanchen Zhai Xing Zhang Abdelrahman M. Ahmed Juan José Aucar Oliver J. Backhouse Samragni Banerjee Peng Bao Nikolay A. Bogdanov Kyle Bystrom Frédéric Chapoton Ning-Yuan Chen Ivan Yu. Chernyshov Helen S. Clifford Sander Cohen-Janes Zhi-Hao Cui Nike Dattani Linus Bjarne Dittmer Sebastian Ehlert Janus Juul Eriksen Francesco A. Evangelista Simon A. Ewing Ardavan Farahvash Kevin Focke Yang Gao Kevin E. Gasperich Nathan Gillispie Jonas Greiner Matthew R. Hennefarth Jan Hermann Christopher Hillenbrand Joonatan Huhtasalo Basil Ibrahim Bhavnesh Jangid Alireza Nejati Javaremi Andrew J. Jenkins Yu Jin Daniel S. King Derk Pieter Kooi Henrik R. Larsson Bryan Tak Gwong Lau Seunghoon Lee Susi Lehtola Chenghan Li Hao Li Jiachen Li Rui Li Shuhang Li Aleksandr O. Lykhin Nastasia Mauger Pablo del Mazo-Sevillano Jonathan Moussa Kousuke Nakano Verena A. Neufeld Linqing Peng Hung Q. Pham Peter Pinski Pavel Pokhilko Zhichen Pu Yubing Qian Stephen Jon Quiton Wanja T. Schulze Thais R. Scott Aniruddha Seal James E. T. Smith Kori E. Smyser Terrence Stahl Chong Sun Kevin J. Sung Egor Trushin Shiv Upadhyay Ethan A. Vo Thijs Vogels Shirong Wang Tai Wang Xiao Wang Xubo Wang Yuanheng Wang Mark Williamson Junjie Yang Hong-Zhou Ye Chia-Nan Yeh Haiyang Yu Jincheng Yu Victor Wen-zhe Yu Chaoqun Zhang Dayou Zhang Zijun Zhao Zehao Zhou Andrew J. Zhu Tianyu Zhu Timothy C. Berkelbach Laura Gagliardi Sandeep Sharma Alexander Sokolov Garnet Kin-Lic Chan