https://arxiv.org/api/C2+eazZJeA/Wa70lV2SA5BpJFic 2026-06-30T14:43:37Z 9958 1920 15 http://arxiv.org/abs/2501.02138v1 Effective LLM-Driven Code Generation with Pythoness 2025-01-03T23:14:46Z

The advent of large language models (LLMs) has paved the way for a new era of programming tools with both significant capabilities and risks, as the generated code lacks guarantees of correctness and reliability. Developers using LLMs currently face the difficult task of optimizing, integrating, and maintaining code generated by AI. We propose an embedded domain-specific language (DSL), Pythoness, to address those challenges. In Pythoness, developers program with LLMs at a higher level of abstraction. Rather than interacting directly with generated code, developers using Pythoness operate at the level of behavioral specifications when writing functions, classes, or an entire program. These specifications can take the form of unit tests and property-based tests, which may be expressed formally or in natural language. Guided by these specifications, Pythoness generates code that both passes the tests and can be continuously checked during execution. We posit that the Pythoness approach lets developers harness the full potential of LLMs for code generation while substantially mitigating their inherent risks. We describe our current prototype implementation of Pythoness and demonstrate that it can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.

2025-01-03T23:14:46Z 5 pages Kyla H. Levin Kyle Gwilt Emery D. Berger Stephen N. Freund http://arxiv.org/abs/2411.02318v3 Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast 2025-01-03T02:19:03Z

Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores OpenAI's GPT-4o model's effectiveness in generating specifications on C programs that are verifiable with VeriFast, a separation logic based static verifier. Our experiment employs three different types of user inputs as well as basic and Chain-of-Thought (CoT) prompting to assess GPT's capabilities. Our results indicate that the specifications generated by GPT-4o preserve functional behavior, but struggle to be verifiable. When the specifications are verifiable they contain redundancies. Future directions are discussed to improve the performance.

2024-11-04T17:44:11Z Wen Fan Marilyn Rego Xin Hu Sanya Dod Zhaorui Ni Danning Xie Jenna DiVincenzo Lin Tan http://arxiv.org/abs/2501.01515v1 DiagrammaticLearning: A Graphical Language for Compositional Training Regimes 2025-01-02T19:44:36Z

Motivated by deep learning regimes with multiple interacting yet distinct model components, we introduce learning diagrams, graphical depictions of training setups that capture parameterized learning as data rather than code. A learning diagram compiles to a unique loss function on which component models are trained. The result of training on this loss is a collection of models whose predictions ``agree" with one another. We show that a number of popular learning setups such as few-shot multi-task learning, knowledge distillation, and multi-modal learning can be depicted as learning diagrams. We further implement learning diagrams in a library that allows users to build diagrams of PyTorch and Flux.jl models. By implementing some classic machine learning use cases, we demonstrate how learning diagrams allow practitioners to build complicated models as compositions of smaller components, identify relationships between workflows, and manipulate models during or after training. Leveraging a category theoretic framework, we introduce a rigorous semantics for learning diagrams that puts such operations on a firm mathematical foundation.

2025-01-02T19:44:36Z Mason Lary Richard Samuelson Alexander Wilentz Alina Zare Matthew Klawonn James P. Fairbanks http://arxiv.org/abs/2501.01512v1 A Primal-Dual Perspective on Program Verification Algorithms (Extended Version) 2025-01-02T19:41:04Z

Many algorithms in verification and automated reasoning leverage some form of duality between proofs and refutations or counterexamples. In most cases, duality is only used as an intuition that helps in understanding the algorithms and is not formalized. In other cases, duality is used explicitly, but in a specially tailored way that does not generalize to other problems. In this paper we propose a unified primal-dual framework for designing verification algorithms that leverage duality. To that end, we generalize the concept of a Lagrangian that is commonly used in linear programming and optimization to capture the domains considered in verification problems, which are usually discrete, e.g., powersets of states, predicates, ranking functions, etc. A Lagrangian then induces a primal problem and a dual problem. We devise an abstract primal-dual procedure that simultaneously searches for a primal solution and a dual solution, where the two searches guide each other. We provide sufficient conditions that ensure that the procedure makes progress under certain monotonicity assumptions on the Lagrangian. We show that many existing algorithms in program analysis, verification, and automated reasoning can be derived from our algorithmic framework with a suitable choice of Lagrangian. The Lagrangian-based formulation sheds new light on various characteristics of these algorithms, such as the ingredients they use to ensure monotonicity and guarantee progress. We further use our framework to develop a new validity checking algorithm for fixpoint logic over quantified linear arithmetic. Our prototype achieves promising results and in some cases solves instances that are not solved by state-of-the-art techniques.

2025-01-02T19:41:04Z 33 pages. Extended version of a paper to be published in POPL 2025. Includes appendices with proofs Takeshi Tsukada Hiroshi Unno Oded Padon Sharon Shoham http://arxiv.org/abs/2401.16277v7 SECOMP: Formally Secure Compilation of Compartmentalized C Programs 2025-01-01T18:50:20Z

Undefined behavior in C often causes devastating security vulnerabilities. One practical mitigation is compartmentalization, which allows developers to structure large programs into mutually distrustful compartments with clearly specified privileges and interactions. In this paper we introduce SECOMP, a compiler for compartmentalized C code that comes with machine-checked proofs guaranteeing that the scope of undefined behavior is restricted to the compartments that encounter it and become dynamically compromised. These guarantees are formalized as the preservation of safety properties against adversarial contexts, a secure compilation criterion similar to full abstraction, and this is the first time such a strong criterion is proven for a mainstream programming language. To achieve this we extend the languages of the CompCert verified C compiler with isolated compartments that can only interact via procedure calls and returns, as specified by cross-compartment interfaces. We adapt the passes and optimizations of CompCert as well as their correctness proofs to this compartment-aware setting. We then use compiler correctness as an ingredient in a larger secure compilation proof that involves several proof engineering novelties, needed to scale formally secure compilation up to a C compiler.

2024-01-29T16:32:36Z CCS'24 version, slightly updated and extended with appendices and a few more references Jérémy Thibault Roberto Blanco Dongjae Lee Sven Argo Arthur Azevedo de Amorim Aïna Linn Georges Catalin Hritcu Andrew Tolmach http://arxiv.org/abs/2501.00655v1 Finding Missed Code Size Optimizations in Compilers using LLMs 2024-12-31T21:47:46Z

Compilers are complex, and significant effort has been expended on testing them. Techniques such as random program generation and differential testing have proved highly effective and have uncovered thousands of bugs in production compilers. The majority of effort has been expended on validating that a compiler produces correct code for a given input, while less attention has been paid to ensuring that the compiler produces performant code. In this work we adapt differential testing to the task of identifying missed optimization opportunities in compilers. We develop a novel testing approach which combines large language models (LLMs) with a series of differential testing strategies and use them to find missing code size optimizations in C / C++ compilers. The advantage of our approach is its simplicity. We offload the complex task of generating random code to an off-the-shelf LLM, and use heuristics and analyses to identify anomalous compiler behavior. Our approach requires fewer than 150 lines of code to implement. This simplicity makes it extensible. By simply changing the target compiler and initial LLM prompt we port the approach from C / C++ to Rust and Swift, finding bugs in both. To date we have reported 24 confirmed bugs in production compilers, and conclude that LLM-assisted testing is a promising avenue for detecting optimization bugs in real world compilers.

2024-12-31T21:47:46Z Accepted to appear in The International Conference on Compiler Construction (CC) 2025 Davide Italiano Chris Cummins http://arxiv.org/abs/2501.00642v1 Enabling New HDLs with Agents 2024-12-31T20:37:20Z

Large Language Models (LLMs) based agents are transforming the programming language landscape by facilitating learning for beginners, enabling code generation, and optimizing documentation workflows. Hardware Description Languages (HDLs), with their smaller user community, stand to benefit significantly from the application of LLMs as tools for learning new HDLs. This paper investigates the challenges and solutions of enabling LLMs for HDLs, particularly for HDLs that LLMs have not been previously trained on. This work introduces HDLAgent, an AI agent optimized for LLMs with limited knowledge of various HDLs. It significantly enhances off-the-shelf LLMs.

2024-12-31T20:37:20Z Mark Zakharov Farzaneh Rabiei Kashanaki Jose Renau http://arxiv.org/abs/2501.14776v1 Green AI: Which Programming Language Consumes the Most? 2024-12-31T13:53:57Z

AI is demanding an evergrowing portion of environmental resources. Despite their potential impact on AI environmental sustainability, the role that programming languages play in AI (in)efficiency is to date still unknown. With this study, we aim to understand the impact that programming languages can have on AI environmental sustainability. To achieve our goal, we conduct a controlled empirical experiment by considering five programming languages (C++, Java, Python, MATLAB, and R), seven AI algorithms (KNN, SVC, AdaBoost, decision tree, logistic regression, naive bayses, and random forest), three popular datasets, and the training and inference phases. The collected results show that programming languages have a considerable impact on AI environmental sustainability. Compiled and semi-compiled languages (C++, Java) consistently consume less than interpreted languages (Python, MATLAB, R), which require up to 54x more energy. Some languages are cumulatively more efficient in training, while others in inference. Which programming language consumes the most highly depends on the algorithm considered. Ultimately, algorithm implementation might be the most determining factor in Green AI, regardless of the language used. As conclusion, while making AI more environmentally sustainable is paramount, a trade-off between energy efficiency and implementation ease should always be considered. Green AI can be achieved without the need of completely disrupting the development practices and technologies currently in place.

2024-12-31T13:53:57Z Accepted at International Workshop on Green and Sustainable Software (GREENS), 2025 Niccolò Marini Leonardo Pampaloni Filippo Di Martino Roberto Verdecchia Enrico Vicario http://arxiv.org/abs/2501.00169v1 DeepLL: Considering Linear Logic for the Analysis of Deep Learning Experiments 2024-12-30T22:38:56Z

Deep Learning experiments have critical requirements regarding the careful handling of their datasets as well as the efficient and correct usage of APIs that interact with hardware accelerators. On the one hand, software mistakes during data handling can contaminate experiments and lead to incorrect results. On the other hand, poorly coded APIs that interact with the hardware can lead to sub-optimal usage and untrustworthy conclusions. In this work we investigate the use of Linear Logic for the analysis of Deep Learning experiments. We show that primitives and operators of Linear Logic can be used to express: (i) an abstract representation of the control flow of an experiment, (ii) a set of available experimental resources, such as API calls to the underlying data-structures and hardware as well as (iii) reasoning rules about the correct consumption of resources during experiments. Our proposed model is not only lightweight but also easy to comprehend having both a symbolic and a visual component. Finally, its artifacts are themselves proofs in Linear Logic that can be readily verified by off-the-shelf reasoners.

2024-12-30T22:38:56Z 8 pages, 3 figures Nick Papoulias http://arxiv.org/abs/2412.20992v1 Verified Lifting of Deep learning Operators 2024-12-30T14:57:32Z

Deep learning operators are fundamental components of modern deep learning frameworks. With the growing demand for customized operators, it has become increasingly common for developers to create their own. However, designing and implementing operators is complex and error-prone, due to hardware-specific optimizations and the need for numerical stability. There is a pressing need for tools that can summarize the functionality of both existing and user-defined operators. To address this gap, this work introduces a novel framework for the verified lifting of deep learning operators, which synthesizes high-level mathematical formulas from low-level implementations. Our approach combines symbolic execution, syntax-guided synthesis, and SMT-based verification to produce readable and formally verified mathematical formulas. In synthesis, we employ a combination of top-down and bottom-up strategies to explore the vast search space efficiently; In verification, we design invariant synthesis patterns and leverage SMT solvers to validate the correctness of the derived summaries; In simplification, we use egraph-based techniques with custom rules to restore complex formulas to their natural, intuitive forms. Evaluated on a dataset of deep learning operators implemented in Triton from the real world, our method demonstrates the effectiveness of synthesis and verification compared to existing techniques. This framework bridges the gap between low-level implementations and high-level abstractions, improving understanding and reliability in deep learning operator development.

2024-12-30T14:57:32Z Qi Zhan Xing Hu Xin Xia Shanping Li http://arxiv.org/abs/2412.20543v1 Automated Auxiliary Qubit Allocation in High-Level Quantum Programming 2024-12-29T18:19:06Z

We present a method for optimizing quantum circuit compilation by automating the allocation of auxiliary qubits for multi-qubit gate decompositions. This approach is implemented and evaluated within the high-level quantum programming platform Ket. Our results indicate that the decomposition of multi-qubit gates is more effectively handled by the compiler, which has access to all circuit parameters, rather than through a quantum programming API. To evaluate the approach, we compared our implementation against Qiskit, a widely used quantum programming platform, by analyzing two quantum algorithms. Using a 16-qubit QPU, we observed a reduction of 87% in the number of CNOT gates in Grover's algorithm for 9 qubits. For a state preparation algorithm with 7 qubits, the number of CNOT gates was reduced from $2.8\times10^7$ to $5.7\times10^3$, leveraging additional Ket optimizations for high-level quantum program constructions. Overall, a quadratic reduction in the number of CNOT gates in the final circuit was observed, with greater improvements achieved when more auxiliary qubits were available. These findings underscore the importance of automatic resource management, such as auxiliary qubit allocation, in optimizing quantum applications and improving their suitability for near-term quantum hardware.

2024-12-29T18:19:06Z Evandro C. R. Rosa Jerusa Marchi Eduardo I. Duzzioni Rafael de Santiago http://arxiv.org/abs/2410.17185v2 The Decision Problem for Regular First-Order Theories 2024-12-29T04:34:42Z

The \emph{Entscheidungsproblem}, or the classical decision problem, asks whether a given formula of first-order logic is satisfiable. In this work, we consider an extension of this problem to regular first-order \emph{theories}, i.e., (infinite) regular sets of formulae. Building on the elegant classification of syntactic classes as decidable or undecidable for the classical decision problem, we show that some classes (specifically, the EPR and Gurevich classes), which are decidable in the classical setting, become undecidable for regular theories. On the other hand, for each of these classes, we identify a subclass that remains decidable in our setting, leaving a complete classification as a challenge for future work. Finally, we observe that our problem generalises prior work on automata-theoretic verification of uninterpreted programs and propose a semantic class of existential formulae for which the problem is decidable.

2024-10-22T17:04:17Z To appear in POPL 2025 (Proceedings of ACM Programming Languages vol 9 POPL) Umang Mathur David Mestel Mahesh Viswanathan 10.1145/3704870 http://arxiv.org/abs/2412.19908v1 Comprehensive Verification of Packet Processing 2024-12-27T19:50:23Z

To prove the functional correctness of a P4 program running in a programmable network switch or smart NIC, prior works have focused mainly on verifiers for the "control block" (match-action pipeline). But to verify that a switch handles packets according to a desired specification, proving the control block is not enough. We demonstrate a new comprehensive framework for formally specifying and proving the additional components of the switch that handle each packet: P4 parsers and deparsers, as well as non-P4 components such as multicast engines, packet generators, and resubmission paths. These are generally triggered by having the P4 program set header or metadata fields, which prompt other switch components -- fixed-function or configurable -- to execute the corresponding actions. Overall behavior is correct only if the "configurable" components are, indeed, configured properly; and we show how to prove that. We demonstrate our framework by verifying the correctness of packet-stream behavior in two classic P4 applications. Our framework is the first to allow the correctness proof of a P4 program to be composed with the correctness proof for these other switch components to verify that the switch programming as a whole accomplishes a specified behavior.

2024-12-27T19:50:23Z Shengyi Wang Mengying Pan Andrew W. Appel http://arxiv.org/abs/2412.19053v1 Flattening subtyping by eta expansion 2024-12-26T04:28:10Z

To design type systems that use subtyping, we have to make tradeoffs. Deep subtyping is more expressive than shallow subtyping, because deep subtyping compares the entire structure of types. However, shallow subtyping is easier to reason about. By eta-expanding source programs, we can get the effect of deep subtyping with less of its complexity. An early paper on filter models (Barendregt et al. 1983) examined two similar intersection type systems. The first included a subsumption rule that used a rich subtyping relation, including multiple rules for the top type and a distributivity rule. Their second type system dropped the subsumption rule, but added a rule that allowed a term to be eta-expanded before typing it. This rule in their second type system compensated for the lack of subsumption: where their first type system used subtyping to manipulate intersections deep inside types, their second type system used introduction and elimination rules directly on the subterms created by eta-expansion. Viewed as a computation, their proof of completeness for the second (shallow) system performs eta-expansion. Thus, we can regard their proof as inventing the application of eta-expansion to avoid deep subtyping. This paper serves as a tutorial on using eta-expansion to obviate deep subtyping, puts the invention of the technique by Barendregt et al. (1983) into context, gives a complete proof of the relevant lemma, and discusses how the technique can be used in type system design.

2024-12-26T04:28:10Z 16 pages, 8 figures Jana Dunfield http://arxiv.org/abs/2412.18885v1 Aspect-oriented Programming with Julia 2024-12-25T11:59:17Z

This paper proposes integrating Aspect-oriented Programming (AOP) into Julia, a language widely used in scientific and High-Performance Computing (HPC). AOP enhances software modularity by encapsulating cross-cutting concerns, such as logging, caching, and parallelizing, into separate, reusable aspects. Leveraging Julia's powerful metaprogramming and abstract syntax tree (AST) manipulation capabilities, we introduce AspectJulia, an AOP framework designed to operate within Julia's runtime environment as a package. AspectJulia enables developers to define and apply aspects seamlessly, leading to more modular, maintainable, and adaptable code. We detail the implementation of AspectJulia and present diverse use cases, ranging from HPC and scientific computing to business applications, demonstrating its effectiveness in managing cross-cutting concerns. This integration simplifies application development and improves the adaptability of existing Julia modules and packages, paving the way for more efficient and maintainable software systems.

2024-12-25T11:59:17Z Osamu Ishimura Yoshihide Yoshimoto