https://arxiv.org/api/w6facgsLIOd05Mx8STj0bvWqd2Q2026-06-21T20:35:46Z991881015http://arxiv.org/abs/2403.11522v4LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers2025-12-27T14:21:25ZWhile polyhedral compilers have shown success in implementing advanced code transformations, they still face challenges in selecting the ones that lead to the most profitable speedups. This has motivated the use of machine learning based cost models to guide the search for polyhedral optimizations. State-of-the-art polyhedral compilers have demonstrated a viable proof-of-concept of such an approach. While promising, this approach still faces significant limitations. State-of-the-art polyhedral compilers that use a deep learning cost model only support a small subset of affine transformations, limiting their ability to explore complex code transformations. Furthermore, their applicability does not scale beyond simple programs, thus excluding many program classes from their scope, such as those with non-rectangular iteration domains or multiple loop nests. These limitations significantly impact the generality of such compilers and autoschedulers and put into question the whole approach. In this paper, we introduce LOOPer, the first polyhedral autoscheduler that uses a deep learning based cost model and covers a large space of affine transformations and programs. LOOPer allows the optimization of an extensive set of programs while being effective at applying complex sequences of polyhedral transformations. We implement and evaluate LOOPer and show that it achieves competitive speedups over the state-of-the-art. On the PolyBench benchmarks, LOOPer achieves a geometric mean speedup of 1.84x over Tiramisu and 1.42x over Pluto, two state-of-the-art polyhedral autoschedulers.2024-03-18T07:22:31Z2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT)Massinissa MerouaniAfif BoudaoudIheb Nassim AouadjNassim TchoulakIslem Kara BernouHamza BenyaminaFatima Benbouzid-Si TayebKarima BenatchbaHugh LeatherRiyadh Baghdadi10.1109/PACT65351.2025.00028http://arxiv.org/abs/2511.00592v2Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization2025-12-27T10:04:35ZAutomatic code optimization remains a difficult challenge, particularly for complex loop nests on modern hardware. This paper investigates a novel approach to code optimization where Large Language Models (LLMs) guide the process through a closed-loop interaction with a compiler. We present ComPilot, an experimental framework that leverages off-the-shelf LLMs, without any task-specific fine-tuning, as interactive optimization agents. ComPilot establishes a feedback loop where an LLM proposes transformations for a given loop nest to a compiler. The compiler attempts the transformations, reporting back legality status and measured speedup or slowdown. The LLM utilizes this concrete feedback to iteratively refine its optimization strategy. Our extensive evaluation across the PolyBench benchmark suite demonstrates the effectiveness of this zero-shot approach. ComPilot achieves geometric mean speedups of 2.66x (single run) and 3.54x (best-of-5 runs) over the original code. Furthermore, ComPilot demonstrates competitive performance against the state-of-the-art Pluto polyhedral optimizer, outperforming it in many cases. This experimental study demonstrates that general-purpose LLMs can effectively guide the code optimization process when grounded by compiler feedback, opening promising research directions for agentic AI in code optimization.2025-11-01T15:32:34ZAccepted at the 34th International Conference on Parallel Architectures and Compilation Techniques (PACT 2025). 12 pages, plus appendix2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT)Massinissa MerouaniIslem Kara BernouRiyadh Baghdadi10.1109/PACT65351.2025.00027http://arxiv.org/abs/2512.22390v1Eliminate Branches by Melding IR Instructions2025-12-26T21:32:26ZBranch mispredictions cause catastrophic performance penalties in modern processors, leading to performance loss. While hardware predictors and profile-guided techniques exist, data-dependent branches with irregular patterns remain challenging. Traditional if-conversion eliminates branches via software predication but faces limitations on architectures like x86. It often fails on paths containing memory instructions or incurs excessive instruction overhead by fully speculating large branch bodies.
This paper presents Melding IR Instructions (MERIT), a compiler transformation that eliminates branches by aligning and melding similar operations from divergent paths at the IR instruction level. By observing that divergent paths often perform structurally similar operations with different operands, MERIT adapts sequence alignment to discover merging opportunities and employs safe operand-level guarding to ensure semantic correctness without hardware predication. Implemented as an LLVM pass and evaluated on 102 programs from four benchmark suites, MERIT achieves a geometric mean speedup of 10.9% with peak improvements of 32x compared to hardware branch predictor, demonstrating the effectiveness with reduced static instruction overhead.2025-12-26T21:32:26ZYuze LiSrinivasan Ramachandra SharmaCharitha SaumyaAli R. ButtKirshanthan Sundararajahhttp://arxiv.org/abs/2512.22383v1Symbolic Specification and Reasoning for Quantum Data and Operations2025-12-26T20:57:42ZIn quantum information and computation research, symbolic methods have been widely used for human specification and reasoning about quantum states and operations. At the same time, they are essential for ensuring the scalability and efficiency of automated reasoning and verification tools for quantum algorithms and programs. However, a formal theory for symbolic specification and reasoning about quantum data and operations is still lacking, which significantly limits the practical applicability of automated verification techniques in quantum computing.
In this paper, we present a general logical framework, called Symbolic Operator Logic $\mathbf{SOL}$, which enables symbolic specification and reasoning about quantum data and operations. Within this framework, a classical first-order logical language is embedded into a language of formal operators used to specify quantum data and operations, including their recursive definitions. This embedding allows reasoning about their properties modulo a chosen theory of the underlying classical data (e.g., Boolean algebra or group theory), thereby leveraging existing automated verification tools developed for classical computing. It should be emphasised that this embedding of classical first-order logic into $\mathbf{SOL}$ is precisely what makes the symbolic method possible.
We envision that this framework can provide a conceptual foundation for the formal verification and automated theorem proving of quantum computation and information in proof assistants such as Lean, Coq, and related systems.2025-12-26T20:57:42ZMingsheng Yinghttp://arxiv.org/abs/2212.11580v6A Theory of Conversion Relations for Prefixed Units of Measure2025-12-26T14:35:49ZUnits of measure with prefixes and conversion rules are given a formal semantic model in terms of categorial group theory. Basic structures and both natural and contingent semantic operations are defined. Conversion rules are represented as a class of ternary relations with both group-like and category-like properties. A hierarchy of subclasses is explored, each satisfying stronger useful algebraic properties than the preceding, culminating in a direct efficient conversion-by-rewriting algorithm.2022-12-22T10:16:17ZFundamenta Informaticae, Volume 195, Issues 1-4: Relational and Algebraic Methods in Computer Science 2024 (December 29, 2025) fi:12436Baltasar Trancón y WidemannMarkus Lepper10.46298/fi.12436http://arxiv.org/abs/2512.23740v1Towards representation agnostic probabilistic programming2025-12-25T15:51:58ZCurrent probabilistic programming languages and tools tightly couple model representations with specific inference algorithms, preventing experimentation with novel representations or mixed discrete-continuous models. We introduce a factor abstraction with five fundamental operations that serve as a universal interface for manipulating factors regardless of their underlying representation. This enables representation-agnostic probabilistic programming where users can freely mix different representations (e.g. discrete tables, Gaussians distributions, sample-based approaches) within a single unified framework, allowing practical inference in complex hybrid models that current toolkits cannot adequately express.2025-12-25T15:51:58ZAccepted at LAFI@POPL25Ole FenskeMaximilian PopkoSebastian BaderThomas Kirstehttp://arxiv.org/abs/2512.21596v1Quantitative Verification of Omega-regular Properties in Probabilistic Programming2025-12-25T09:26:29ZProbabilistic programming provides a high-level framework for specifying statistical models as executable programs with built-in randomness and conditioning. Existing inference techniques, however, typically compute posterior distributions over program states at fixed time points, most often at termination, thereby failing to capture the temporal evolution of probabilistic behaviors. We introduce temporal posterior inference (TPI), a new framework that unifies probabilistic programming with temporal logic by computing posterior distributions over execution traces that satisfy omega-regular specifications, conditioned on possibly temporal observations. To obtain rigorous quantitative guarantees, we develop a new method for computing upper and lower bounds on the satisfaction probabilities of omega-regular properties. Our approach decomposes Rabin acceptance conditions into persistence and recurrence components and constructs stochastic barrier certificates that soundly bound each component. We implement our approach in a prototype tool, TPInfer, and evaluate it on a suite of benchmarks, demonstrating effective and efficient inference over rich temporal properties in probabilistic models.2025-12-25T09:26:29ZPeixin WangJianhao BaiMin ZhangC. -H. Luke Onghttp://arxiv.org/abs/2512.23738v1Enforcing Temporal Constraints for LLM Agents2025-12-25T06:12:13ZLLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and sequencing of agent actions. For instance, agents may access sensitive data before authenticating users or process refunds to unauthorized payment methods, violations that require reasoning about sequences of action rather than an individual action. Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring, and provide no formal guarantees that agents will satisfy temporal constraints. We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties. Agent-C introduces a domain-specific language for expressing temporal properties (e.g., authenticate before accessing data), translates specifications to first-order logic, and uses SMT solving to detect non-compliant agent actions during token generation. When the LLM attempts to generate a non-compliant tool call, Agent-C leverages constrained generation techniques to ensure that every action generated by the LLM complies with the specification, and to generate a compliant alternative to a non-compliant agent action. We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system, and multiple language models (open and closed-source). Our results demonstrate that Agent-C achieves perfect safety (100% conformance, 0% harm), while improving task utility compared to state-of-the-art guardrails and unrestricted agents. On SoTA closed-source models, Agent-C improves conformance (77.4% to 100% for Claude Sonnet 4.5 and 83.7% to 100% for GPT-5), while simultaneously increasing utility (71.8% to 75.2% and 66.1% to 70.6%, respectively), representing a new SoTA frontier for reliable agentic reasoning.2025-12-25T06:12:13ZAdharsh KamathSishen ZhangCalvin XuShubham UgareGagandeep SinghSasa Misailovichttp://arxiv.org/abs/2512.20214v2Error Localization, Certificates, and Hints for Probabilistic Program Verification via Slicing (Extended Version)2025-12-24T12:57:44ZThis paper focuses on effective user diagnostics generated during the deductive verification of probabilistic programs. Our key principle is based on providing slices for (1) error reporting, (2) proof simplification, and (3) preserving successful verification results. By formally defining these different notions on HeyVL, an existing quantitative intermediate verification language (IVL), our concepts (and implementation) can be used to obtain diagnostics for a range of probabilistic programming languages. Slicing for error reporting is a novel notion of error localization for quantitative assertions. We demonstrate slicing-based diagnostics on a variety of proof rules such as quantitative versions of the specification statement and invariant-based loop rules, and formally prove the correctness of specialized error messages and verification hints.
We implemented our user diagnostics into the deductive verifier Caesar. Our novel implementation -- called \emph{Brutus} -- can search for slices which do or do not verify, corresponding to each of the three diagnostic notions. For error reporting (1), it exploits a binary search-based algorithm that minimizes error-witnessing slices. To solve for slices that verify (2 and 3), we empirically compare different algorithms based on unsatisfiable cores, minimal unsatisfiable subset enumeration, and a direct SMT encoding of the slicing problem. Our empirical evaluation of Brutus on existing and new benchmarks shows that we can find slices that are both small and informative.2025-12-23T10:15:28ZAccepted at the European Symposium on Programming (ESOP) 2026Philipp SchröerDarion HaaseJoost-Pieter Katoenhttp://arxiv.org/abs/2512.21373v1AInsteinBench: Benchmarking Coding Agents on Scientific Repositories2025-12-24T08:11:11ZWe introduce AInsteinBench, a large-scale benchmark for evaluating whether large language model (LLM) agents can operate as scientific computing development agents within real research software ecosystems. Unlike existing scientific reasoning benchmarks which focus on conceptual knowledge, or software engineering benchmarks that emphasize generic feature implementation and issue resolving, AInsteinBench evaluates models in end-to-end scientific development settings grounded in production-grade scientific repositories. The benchmark consists of tasks derived from maintainer-authored pull requests across six widely used scientific codebases, spanning quantum chemistry, quantum computing, molecular dynamics, numerical relativity, fluid dynamics, and cheminformatics. All benchmark tasks are carefully curated through multi-stage filtering and expert review to ensure scientific challenge, adequate test coverage, and well-calibrated difficulty. By leveraging evaluation in executable environments, scientifically meaningful failure modes, and test-driven verification, AInsteinBench measures a model's ability to move beyond surface-level code generation toward the core competencies required for computational scientific research.2025-12-24T08:11:11ZTitouan DustonShuo XinYang SunDaoguang ZanAoyan LiShulin XinKai ShenYixiao ChenQiming SunGe ZhangJiashuo LiuHuan ZhouJingkai LiuZhichen PuYuanheng WangBo-Xuan GeXin TongFei YeZhi-Chao ZhaoWen-Biao HanZhoujian CaoYueran ZhaoWeiluo RenQingshen LongYuxiao LiuAnni HuangYidi DuYuanyuan RongJiahao Penghttp://arxiv.org/abs/2507.18885v4A Minimalist Proof Language for Neural Theorem Proving over Isabelle/HOL2025-12-23T17:57:57ZNeural Theorem Proving (NTP) employs LLMs to automate formal proofs in proof assistants. While LLMs have achieved relatively remarkable success in informal reasoning tasks using natural languages, the transition to mechanized formal theorem proving presents persistent challenges. Mechanized proof languages often contain many syntactic constructs and diverse, specialized proof tactics, which facilitate expert use but have no direct counterpart in informal mathematical proofs. These prover-specific idioms represent an additional burden for LLM-based NTPs that might be otherwise successful in generating informal proofs. Seeking to bridge this gap between formal proof construction and informal reasoning, in order to better facilitate NTP, this work approaches these challenges from a language design perspective. We look at common reasoning patterns in informal proofs and in existing mechanized proofs, and design Minilang -- a minimalist proof language that captures these reasoning patterns. In contrast to proof languages (informal and formal) that often feature a large collection of operations with unclear semantic boundaries, Minilang is deliberately kept minimalist -- its core design comprises only 10 operations, each with clear semantic distinctions. We further develop a rule-based translator from Isabelle's language (Isar) to Minilang, translating ~340K existing proofs with an ~85% success rate. Using this translated corpus, we finetune two LLMs to compare machine learning performance on Minilang versus the original Isar. Experiments show Minilang benefits the two LLMs by improving the pass@1 success rate on the PISA benchmark by up to 20/29 percentage points in comparison to the Isar-based LLMs w/wo Sledgehammer. The pass@1 rate reaches 69.1%, exceeding the prior work Baldur's pass@64 (65.7%); the pass@8 rate reaches 79.2%, exceeding the SOTA on PISA (71.0%) achieved by Magnushammer.2025-07-25T02:04:56ZAccepted in OOPSLA'26Qiyuan XuRenxi WangPeixin WangHaonan LiConrad Watthttp://arxiv.org/abs/2511.21878v2Advancing Automated In-Isolation Validation in Repository-Level Code Translation2025-12-23T15:17:24ZRepository-level code translation aims to migrate entire repositories across programming languages while preserving functionality automatically. Despite advancements in repository-level code translation, validating the translations remains challenging. This paper proposes TRAM, which combines context-aware type resolution with mock-based in-isolation validation to achieve high-quality translations between programming languages. Prior to translation, TRAM retrieves API documentation and contextual code information for each variable type in the source language. It then prompts a large language model (LLM) with retrieved contextual information to resolve type mappings across languages with precise semantic interpretations. Using the automatically constructed type mapping, TRAM employs a custom serialization/deserialization workflow that automatically constructs equivalent mock objects in the target language. This enables each method fragment to be validated in isolation, without the high cost of using agents for translation validation, or the heavy manual effort required by existing approaches that rely on language interoperability. TRAM demonstrates state-of-the-art performance in Java-to-Python translation, underscoring the effectiveness of its integration of RAG-based type resolution with reliable in-isolation validation.2025-11-26T19:53:46ZKaiyao KeAli Reza IbrahimzadaRangeet PanSaurabh SinhaReyhaneh Jabbarvandhttp://arxiv.org/abs/2512.20396v1Symmaries: Automatic Inference of Formal Security Summaries for Java Programs2025-12-23T14:33:31ZWe introduce a scalable, modular, and sound approach for automatically constructing formal security specifications for Java bytecode programs in the form of method summaries. A summary provides an abstract representation of a method's security behavior, consisting of the conditions under which the method can be securely invoked, together with specifications of information flows and aliasing updates. Such summaries can be consumed by static code analysis tools and also help developers understand the behavior of code segments, such as libraries, in order to evaluate their security implications when reused in applications. Our approach is implemented in a tool called Symmaries, which automates the generation of security summaries. We applied Symmaries to Java API libraries to extract their security specifications and to large real-world applications to evaluate its scalability. Our results show that the tool successfully scales to analyze applications with hundreds of thousands of lines of code, and that Symmaries achieves a promising precision depending on the heap model used. We prove the soundness of our approach in terms of guaranteeing termination-insensitive non-interference.2025-12-23T14:33:31ZNarges KhakpourNicolas Berthierhttp://arxiv.org/abs/2503.12511v3SACTOR: LLM-Driven Correct and Idiomatic C to Rust Translation with Static Analysis and FFI-Based Verification2025-12-22T20:28:48ZTranslating software written in C to Rust has significant benefits in improving memory safety. However, manual translation is cumbersome, error-prone, and often produces unidiomatic code. Large language models (LLMs) have demonstrated promise in producing idiomatic translations, but offer no correctness guarantees. We propose SACTOR, an LLM-driven C-to-Rust translation tool that employs a two-step process: an initial "unidiomatic" translation to preserve semantics, followed by an "idiomatic" refinement to align with Rust standards. To validate correctness of our function-wise incremental translation that mixes C and Rust, we use end-to-end testing via the foreign function interface. We evaluate SACTOR on 200 programs from two public datasets and on two more real-world scenarios (a 50-sample subset of CRust-Bench and the libogg library), comparing multiple LLMs. Across datasets, SACTOR delivers high end-to-end correctness and produces safe, idiomatic Rust with up to 7 times fewer Clippy warnings; On CRust-Bench, SACTOR achieves an average (across samples) of 85% unidiomatic and 52% idiomatic success, and on libogg it attains full unidiomatic and up to 78% idiomatic coverage on GPT-5.2025-03-16T14:05:26Z35 pages, 15 figures Previously named as "LLM-Driven Multi-step Translation from C to Rust using Static Analysis"Tianyang ZhouZiyi ZhangHaowen LinSomesh JhaMihai ChristodorescuKirill LevchenkoVarun Chandrasekaranhttp://arxiv.org/abs/2410.19849v2Deep Learning and Machine Learning -- Python Data Structures and Mathematics Fundamental: From Theory to Practice2025-12-22T19:10:13ZThis book provides a comprehensive introduction to the foundational concepts of machine learning (ML) and deep learning (DL). It bridges the gap between theoretical mathematics and practical application, focusing on Python as the primary programming language for implementing key algorithms and data structures. The book covers a wide range of topics, including basic and advanced Python programming, fundamental mathematical operations, matrix operations, linear algebra, and optimization techniques crucial for training ML and DL models. Advanced subjects like neural networks, optimization algorithms, and frequency domain methods are also explored, along with real-world applications of large language models (LLMs) and artificial intelligence (AI) in big data management. Designed for both beginners and advanced learners, the book emphasizes the critical role of mathematical principles in developing scalable AI solutions. Practical examples and Python code are provided throughout, ensuring readers gain hands-on experience in applying theoretical knowledge to solve complex problems in ML, DL, and big data analytics.2024-10-22T06:55:53Z298 pagesSilin ChenZiqian BiJunyu LiuBenji PengSen ZhangXuanhe PanJiawei XuJinlang WangKeyu ChenCaitlyn Heqi YinPohsun FengYizhu WenTianyang WangMing LiJintao RenQian NiuXinyuan SongMing Liu