https://arxiv.org/api/w6facgsLIOd05Mx8STj0bvWqd2Q 2026-06-21T20:35:46Z 9918 810 15 http://arxiv.org/abs/2403.11522v4 LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers 2025-12-27T14:21:25Z

While polyhedral compilers have shown success in implementing advanced code transformations, they still face challenges in selecting the ones that lead to the most profitable speedups. This has motivated the use of machine learning based cost models to guide the search for polyhedral optimizations. State-of-the-art polyhedral compilers have demonstrated a viable proof-of-concept of such an approach. While promising, this approach still faces significant limitations. State-of-the-art polyhedral compilers that use a deep learning cost model only support a small subset of affine transformations, limiting their ability to explore complex code transformations. Furthermore, their applicability does not scale beyond simple programs, thus excluding many program classes from their scope, such as those with non-rectangular iteration domains or multiple loop nests. These limitations significantly impact the generality of such compilers and autoschedulers and put into question the whole approach. In this paper, we introduce LOOPer, the first polyhedral autoscheduler that uses a deep learning based cost model and covers a large space of affine transformations and programs. LOOPer allows the optimization of an extensive set of programs while being effective at applying complex sequences of polyhedral transformations. We implement and evaluate LOOPer and show that it achieves competitive speedups over the state-of-the-art. On the PolyBench benchmarks, LOOPer achieves a geometric mean speedup of 1.84x over Tiramisu and 1.42x over Pluto, two state-of-the-art polyhedral autoschedulers.

2024-03-18T07:22:31Z 2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT) Massinissa Merouani Afif Boudaoud Iheb Nassim Aouadj Nassim Tchoulak Islem Kara Bernou Hamza Benyamina Fatima Benbouzid-Si Tayeb Karima Benatchba Hugh Leather Riyadh Baghdadi 10.1109/PACT65351.2025.00028 http://arxiv.org/abs/2511.00592v2 Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization 2025-12-27T10:04:35Z

Automatic code optimization remains a difficult challenge, particularly for complex loop nests on modern hardware. This paper investigates a novel approach to code optimization where Large Language Models (LLMs) guide the process through a closed-loop interaction with a compiler. We present ComPilot, an experimental framework that leverages off-the-shelf LLMs, without any task-specific fine-tuning, as interactive optimization agents. ComPilot establishes a feedback loop where an LLM proposes transformations for a given loop nest to a compiler. The compiler attempts the transformations, reporting back legality status and measured speedup or slowdown. The LLM utilizes this concrete feedback to iteratively refine its optimization strategy. Our extensive evaluation across the PolyBench benchmark suite demonstrates the effectiveness of this zero-shot approach. ComPilot achieves geometric mean speedups of 2.66x (single run) and 3.54x (best-of-5 runs) over the original code. Furthermore, ComPilot demonstrates competitive performance against the state-of-the-art Pluto polyhedral optimizer, outperforming it in many cases. This experimental study demonstrates that general-purpose LLMs can effectively guide the code optimization process when grounded by compiler feedback, opening promising research directions for agentic AI in code optimization.

2025-11-01T15:32:34Z Accepted at the 34th International Conference on Parallel Architectures and Compilation Techniques (PACT 2025). 12 pages, plus appendix 2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT) Massinissa Merouani Islem Kara Bernou Riyadh Baghdadi 10.1109/PACT65351.2025.00027 http://arxiv.org/abs/2512.22390v1 Eliminate Branches by Melding IR Instructions 2025-12-26T21:32:26Z

Branch mispredictions cause catastrophic performance penalties in modern processors, leading to performance loss. While hardware predictors and profile-guided techniques exist, data-dependent branches with irregular patterns remain challenging. Traditional if-conversion eliminates branches via software predication but faces limitations on architectures like x86. It often fails on paths containing memory instructions or incurs excessive instruction overhead by fully speculating large branch bodies. This paper presents Melding IR Instructions (MERIT), a compiler transformation that eliminates branches by aligning and melding similar operations from divergent paths at the IR instruction level. By observing that divergent paths often perform structurally similar operations with different operands, MERIT adapts sequence alignment to discover merging opportunities and employs safe operand-level guarding to ensure semantic correctness without hardware predication. Implemented as an LLVM pass and evaluated on 102 programs from four benchmark suites, MERIT achieves a geometric mean speedup of 10.9% with peak improvements of 32x compared to hardware branch predictor, demonstrating the effectiveness with reduced static instruction overhead.

2025-12-26T21:32:26Z Yuze Li Srinivasan Ramachandra Sharma Charitha Saumya Ali R. Butt Kirshanthan Sundararajah http://arxiv.org/abs/2512.22383v1 Symbolic Specification and Reasoning for Quantum Data and Operations 2025-12-26T20:57:42Z

In quantum information and computation research, symbolic methods have been widely used for human specification and reasoning about quantum states and operations. At the same time, they are essential for ensuring the scalability and efficiency of automated reasoning and verification tools for quantum algorithms and programs. However, a formal theory for symbolic specification and reasoning about quantum data and operations is still lacking, which significantly limits the practical applicability of automated verification techniques in quantum computing. In this paper, we present a general logical framework, called Symbolic Operator Logic $\mathbf{SOL}$, which enables symbolic specification and reasoning about quantum data and operations. Within this framework, a classical first-order logical language is embedded into a language of formal operators used to specify quantum data and operations, including their recursive definitions. This embedding allows reasoning about their properties modulo a chosen theory of the underlying classical data (e.g., Boolean algebra or group theory), thereby leveraging existing automated verification tools developed for classical computing. It should be emphasised that this embedding of classical first-order logic into $\mathbf{SOL}$ is precisely what makes the symbolic method possible. We envision that this framework can provide a conceptual foundation for the formal verification and automated theorem proving of quantum computation and information in proof assistants such as Lean, Coq, and related systems.

2025-12-26T20:57:42Z Mingsheng Ying http://arxiv.org/abs/2212.11580v6 A Theory of Conversion Relations for Prefixed Units of Measure 2025-12-26T14:35:49Z

Units of measure with prefixes and conversion rules are given a formal semantic model in terms of categorial group theory. Basic structures and both natural and contingent semantic operations are defined. Conversion rules are represented as a class of ternary relations with both group-like and category-like properties. A hierarchy of subclasses is explored, each satisfying stronger useful algebraic properties than the preceding, culminating in a direct efficient conversion-by-rewriting algorithm.

2022-12-22T10:16:17Z Fundamenta Informaticae, Volume 195, Issues 1-4: Relational and Algebraic Methods in Computer Science 2024 (December 29, 2025) fi:12436 Baltasar Trancón y Widemann Markus Lepper 10.46298/fi.12436 http://arxiv.org/abs/2512.23740v1 Towards representation agnostic probabilistic programming 2025-12-25T15:51:58Z

Current probabilistic programming languages and tools tightly couple model representations with specific inference algorithms, preventing experimentation with novel representations or mixed discrete-continuous models. We introduce a factor abstraction with five fundamental operations that serve as a universal interface for manipulating factors regardless of their underlying representation. This enables representation-agnostic probabilistic programming where users can freely mix different representations (e.g. discrete tables, Gaussians distributions, sample-based approaches) within a single unified framework, allowing practical inference in complex hybrid models that current toolkits cannot adequately express.

2025-12-25T15:51:58Z Accepted at LAFI@POPL25 Ole Fenske Maximilian Popko Sebastian Bader Thomas Kirste http://arxiv.org/abs/2512.21596v1 Quantitative Verification of Omega-regular Properties in Probabilistic Programming 2025-12-25T09:26:29Z

Probabilistic programming provides a high-level framework for specifying statistical models as executable programs with built-in randomness and conditioning. Existing inference techniques, however, typically compute posterior distributions over program states at fixed time points, most often at termination, thereby failing to capture the temporal evolution of probabilistic behaviors. We introduce temporal posterior inference (TPI), a new framework that unifies probabilistic programming with temporal logic by computing posterior distributions over execution traces that satisfy omega-regular specifications, conditioned on possibly temporal observations. To obtain rigorous quantitative guarantees, we develop a new method for computing upper and lower bounds on the satisfaction probabilities of omega-regular properties. Our approach decomposes Rabin acceptance conditions into persistence and recurrence components and constructs stochastic barrier certificates that soundly bound each component. We implement our approach in a prototype tool, TPInfer, and evaluate it on a suite of benchmarks, demonstrating effective and efficient inference over rich temporal properties in probabilistic models.

2025-12-25T09:26:29Z Peixin Wang Jianhao Bai Min Zhang C. -H. Luke Ong http://arxiv.org/abs/2512.23738v1 Enforcing Temporal Constraints for LLM Agents 2025-12-25T06:12:13Z

LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and sequencing of agent actions. For instance, agents may access sensitive data before authenticating users or process refunds to unauthorized payment methods, violations that require reasoning about sequences of action rather than an individual action. Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring, and provide no formal guarantees that agents will satisfy temporal constraints. We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties. Agent-C introduces a domain-specific language for expressing temporal properties (e.g., authenticate before accessing data), translates specifications to first-order logic, and uses SMT solving to detect non-compliant agent actions during token generation. When the LLM attempts to generate a non-compliant tool call, Agent-C leverages constrained generation techniques to ensure that every action generated by the LLM complies with the specification, and to generate a compliant alternative to a non-compliant agent action. We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system, and multiple language models (open and closed-source). Our results demonstrate that Agent-C achieves perfect safety (100% conformance, 0% harm), while improving task utility compared to state-of-the-art guardrails and unrestricted agents. On SoTA closed-source models, Agent-C improves conformance (77.4% to 100% for Claude Sonnet 4.5 and 83.7% to 100% for GPT-5), while simultaneously increasing utility (71.8% to 75.2% and 66.1% to 70.6%, respectively), representing a new SoTA frontier for reliable agentic reasoning.

2025-12-25T06:12:13Z Adharsh Kamath Sishen Zhang Calvin Xu Shubham Ugare Gagandeep Singh Sasa Misailovic http://arxiv.org/abs/2512.20214v2 Error Localization, Certificates, and Hints for Probabilistic Program Verification via Slicing (Extended Version) 2025-12-24T12:57:44Z

This paper focuses on effective user diagnostics generated during the deductive verification of probabilistic programs. Our key principle is based on providing slices for (1) error reporting, (2) proof simplification, and (3) preserving successful verification results. By formally defining these different notions on HeyVL, an existing quantitative intermediate verification language (IVL), our concepts (and implementation) can be used to obtain diagnostics for a range of probabilistic programming languages. Slicing for error reporting is a novel notion of error localization for quantitative assertions. We demonstrate slicing-based diagnostics on a variety of proof rules such as quantitative versions of the specification statement and invariant-based loop rules, and formally prove the correctness of specialized error messages and verification hints. We implemented our user diagnostics into the deductive verifier Caesar. Our novel implementation -- called \emph{Brutus} -- can search for slices which do or do not verify, corresponding to each of the three diagnostic notions. For error reporting (1), it exploits a binary search-based algorithm that minimizes error-witnessing slices. To solve for slices that verify (2 and 3), we empirically compare different algorithms based on unsatisfiable cores, minimal unsatisfiable subset enumeration, and a direct SMT encoding of the slicing problem. Our empirical evaluation of Brutus on existing and new benchmarks shows that we can find slices that are both small and informative.

2025-12-23T10:15:28Z Accepted at the European Symposium on Programming (ESOP) 2026 Philipp Schröer Darion Haase Joost-Pieter Katoen http://arxiv.org/abs/2512.21373v1 AInsteinBench: Benchmarking Coding Agents on Scientific Repositories 2025-12-24T08:11:11Z

We introduce AInsteinBench, a large-scale benchmark for evaluating whether large language model (LLM) agents can operate as scientific computing development agents within real research software ecosystems. Unlike existing scientific reasoning benchmarks which focus on conceptual knowledge, or software engineering benchmarks that emphasize generic feature implementation and issue resolving, AInsteinBench evaluates models in end-to-end scientific development settings grounded in production-grade scientific repositories. The benchmark consists of tasks derived from maintainer-authored pull requests across six widely used scientific codebases, spanning quantum chemistry, quantum computing, molecular dynamics, numerical relativity, fluid dynamics, and cheminformatics. All benchmark tasks are carefully curated through multi-stage filtering and expert review to ensure scientific challenge, adequate test coverage, and well-calibrated difficulty. By leveraging evaluation in executable environments, scientifically meaningful failure modes, and test-driven verification, AInsteinBench measures a model's ability to move beyond surface-level code generation toward the core competencies required for computational scientific research.

2025-12-24T08:11:11Z Titouan Duston Shuo Xin Yang Sun Daoguang Zan Aoyan Li Shulin Xin Kai Shen Yixiao Chen Qiming Sun Ge Zhang Jiashuo Liu Huan Zhou Jingkai Liu Zhichen Pu Yuanheng Wang Bo-Xuan Ge Xin Tong Fei Ye Zhi-Chao Zhao Wen-Biao Han Zhoujian Cao Yueran Zhao Weiluo Ren Qingshen Long Yuxiao Liu Anni Huang Yidi Du Yuanyuan Rong Jiahao Peng http://arxiv.org/abs/2507.18885v4 A Minimalist Proof Language for Neural Theorem Proving over Isabelle/HOL 2025-12-23T17:57:57Z

Neural Theorem Proving (NTP) employs LLMs to automate formal proofs in proof assistants. While LLMs have achieved relatively remarkable success in informal reasoning tasks using natural languages, the transition to mechanized formal theorem proving presents persistent challenges. Mechanized proof languages often contain many syntactic constructs and diverse, specialized proof tactics, which facilitate expert use but have no direct counterpart in informal mathematical proofs. These prover-specific idioms represent an additional burden for LLM-based NTPs that might be otherwise successful in generating informal proofs. Seeking to bridge this gap between formal proof construction and informal reasoning, in order to better facilitate NTP, this work approaches these challenges from a language design perspective. We look at common reasoning patterns in informal proofs and in existing mechanized proofs, and design Minilang -- a minimalist proof language that captures these reasoning patterns. In contrast to proof languages (informal and formal) that often feature a large collection of operations with unclear semantic boundaries, Minilang is deliberately kept minimalist -- its core design comprises only 10 operations, each with clear semantic distinctions. We further develop a rule-based translator from Isabelle's language (Isar) to Minilang, translating ~340K existing proofs with an ~85% success rate. Using this translated corpus, we finetune two LLMs to compare machine learning performance on Minilang versus the original Isar. Experiments show Minilang benefits the two LLMs by improving the pass@1 success rate on the PISA benchmark by up to 20/29 percentage points in comparison to the Isar-based LLMs w/wo Sledgehammer. The pass@1 rate reaches 69.1%, exceeding the prior work Baldur's pass@64 (65.7%); the pass@8 rate reaches 79.2%, exceeding the SOTA on PISA (71.0%) achieved by Magnushammer.

2025-07-25T02:04:56Z Accepted in OOPSLA'26 Qiyuan Xu Renxi Wang Peixin Wang Haonan Li Conrad Watt http://arxiv.org/abs/2511.21878v2 Advancing Automated In-Isolation Validation in Repository-Level Code Translation 2025-12-23T15:17:24Z

Repository-level code translation aims to migrate entire repositories across programming languages while preserving functionality automatically. Despite advancements in repository-level code translation, validating the translations remains challenging. This paper proposes TRAM, which combines context-aware type resolution with mock-based in-isolation validation to achieve high-quality translations between programming languages. Prior to translation, TRAM retrieves API documentation and contextual code information for each variable type in the source language. It then prompts a large language model (LLM) with retrieved contextual information to resolve type mappings across languages with precise semantic interpretations. Using the automatically constructed type mapping, TRAM employs a custom serialization/deserialization workflow that automatically constructs equivalent mock objects in the target language. This enables each method fragment to be validated in isolation, without the high cost of using agents for translation validation, or the heavy manual effort required by existing approaches that rely on language interoperability. TRAM demonstrates state-of-the-art performance in Java-to-Python translation, underscoring the effectiveness of its integration of RAG-based type resolution with reliable in-isolation validation.

2025-11-26T19:53:46Z Kaiyao Ke Ali Reza Ibrahimzada Rangeet Pan Saurabh Sinha Reyhaneh Jabbarvand http://arxiv.org/abs/2512.20396v1 Symmaries: Automatic Inference of Formal Security Summaries for Java Programs 2025-12-23T14:33:31Z

We introduce a scalable, modular, and sound approach for automatically constructing formal security specifications for Java bytecode programs in the form of method summaries. A summary provides an abstract representation of a method's security behavior, consisting of the conditions under which the method can be securely invoked, together with specifications of information flows and aliasing updates. Such summaries can be consumed by static code analysis tools and also help developers understand the behavior of code segments, such as libraries, in order to evaluate their security implications when reused in applications. Our approach is implemented in a tool called Symmaries, which automates the generation of security summaries. We applied Symmaries to Java API libraries to extract their security specifications and to large real-world applications to evaluate its scalability. Our results show that the tool successfully scales to analyze applications with hundreds of thousands of lines of code, and that Symmaries achieves a promising precision depending on the heap model used. We prove the soundness of our approach in terms of guaranteeing termination-insensitive non-interference.

2025-12-23T14:33:31Z Narges Khakpour Nicolas Berthier http://arxiv.org/abs/2503.12511v3 SACTOR: LLM-Driven Correct and Idiomatic C to Rust Translation with Static Analysis and FFI-Based Verification 2025-12-22T20:28:48Z

Translating software written in C to Rust has significant benefits in improving memory safety. However, manual translation is cumbersome, error-prone, and often produces unidiomatic code. Large language models (LLMs) have demonstrated promise in producing idiomatic translations, but offer no correctness guarantees. We propose SACTOR, an LLM-driven C-to-Rust translation tool that employs a two-step process: an initial "unidiomatic" translation to preserve semantics, followed by an "idiomatic" refinement to align with Rust standards. To validate correctness of our function-wise incremental translation that mixes C and Rust, we use end-to-end testing via the foreign function interface. We evaluate SACTOR on 200 programs from two public datasets and on two more real-world scenarios (a 50-sample subset of CRust-Bench and the libogg library), comparing multiple LLMs. Across datasets, SACTOR delivers high end-to-end correctness and produces safe, idiomatic Rust with up to 7 times fewer Clippy warnings; On CRust-Bench, SACTOR achieves an average (across samples) of 85% unidiomatic and 52% idiomatic success, and on libogg it attains full unidiomatic and up to 78% idiomatic coverage on GPT-5.

2025-03-16T14:05:26Z 35 pages, 15 figures Previously named as "LLM-Driven Multi-step Translation from C to Rust using Static Analysis" Tianyang Zhou Ziyi Zhang Haowen Lin Somesh Jha Mihai Christodorescu Kirill Levchenko Varun Chandrasekaran http://arxiv.org/abs/2410.19849v2 Deep Learning and Machine Learning -- Python Data Structures and Mathematics Fundamental: From Theory to Practice 2025-12-22T19:10:13Z

This book provides a comprehensive introduction to the foundational concepts of machine learning (ML) and deep learning (DL). It bridges the gap between theoretical mathematics and practical application, focusing on Python as the primary programming language for implementing key algorithms and data structures. The book covers a wide range of topics, including basic and advanced Python programming, fundamental mathematical operations, matrix operations, linear algebra, and optimization techniques crucial for training ML and DL models. Advanced subjects like neural networks, optimization algorithms, and frequency domain methods are also explored, along with real-world applications of large language models (LLMs) and artificial intelligence (AI) in big data management. Designed for both beginners and advanced learners, the book emphasizes the critical role of mathematical principles in developing scalable AI solutions. Practical examples and Python code are provided throughout, ensuring readers gain hands-on experience in applying theoretical knowledge to solve complex problems in ML, DL, and big data analytics.

2024-10-22T06:55:53Z 298 pages Silin Chen Ziqian Bi Junyu Liu Benji Peng Sen Zhang Xuanhe Pan Jiawei Xu Jinlang Wang Keyu Chen Caitlyn Heqi Yin Pohsun Feng Yizhu Wen Tianyang Wang Ming Li Jintao Ren Qian Niu Xinyuan Song Ming Liu