https://arxiv.org/api/FkubxhxDvy6XTEz/g8pb3wDxVZU2026-06-28T06:17:01Z9951148515http://arxiv.org/abs/2403.16354v5ChatDBG: Augmenting Debugging with Large Language Models2025-06-19T19:23:11ZDebugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or assertion failures, and explore open-ended queries like "why is x null?". To handle these queries, ChatDBG grants the LLM autonomy to "take the wheel": it can act as an independent agent capable of querying and controlling the debugger to navigate through stacks and inspect program state. It then reports its findings and yields back control to the programmer. By leveraging the real-world knowledge embedded in LLMs, ChatDBG can diagnose issues identifiable only through the use of domain-specific reasoning. Our ChatDBG prototype integrates with standard debuggers including LLDB and GDB for native code and Pdb for Python. Our evaluation across a diverse set of code, including C/C++ code with known bugs and a suite of Python code including standalone scripts and Jupyter notebooks, demonstrates that ChatDBG can successfully analyze root causes, explain bugs, and generate accurate fixes for a wide range of real-world errors. For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%. ChatDBG has seen rapid uptake; it has already been downloaded more than 75,000 times.2024-03-25T01:12:57Z22 pages, https://doi.org/10.1145/3729355FSE 2025Kyla H. LevinNicolas van KempenEmery D. BergerStephen N. Freund10.1145/3729355http://arxiv.org/abs/2506.16048v1WAMI: Compilation to WebAssembly through MLIR without Losing Abstraction2025-06-19T06:05:26ZWebAssembly (Wasm) is a portable bytecode format that serves as a compilation target for high-level languages, enabling their secure and efficient execution across diverse platforms, including web browsers and embedded systems. To improve support for high-level languages without incurring significant code size or performance overheads, Wasm continuously evolves by integrating high-level features such as Garbage Collection and Stack Switching. However, existing compilation approaches either lack reusable design -- requiring redundant implementation efforts for each language -- or lose abstraction by lowering high-level constructs into low-level shared representations like LLVM IR, which hinder the adoption of high-level features. MLIR compiler infrastructure provides the compilation pipeline with multiple levels of abstraction, preserving high-level abstractions throughout the compilation pipeline, yet the current MLIR pipeline relies on the LLVM backend for Wasm code generation, thereby inheriting LLVM's limitations.
This paper presents a novel compilation pipeline for Wasm, featuring Wasm dialects explicitly designed to represent high-level Wasm constructs within MLIR. Our approach enables direct generation of high-level Wasm code from corresponding high-level MLIR dialects without losing abstraction, providing a modular and extensible way to incorporate high-level Wasm features. We illustrate this extensibility through a case study that leverages Stack Switching, a recently introduced high-level feature of Wasm. Performance evaluations on PolyBench benchmarks show that our pipeline, benefiting from optimizations within the MLIR and Wasm ecosystems, produces code with at most 7.7\% slower, and faster in some execution environments, compared to LLVM-based compilers.2025-06-19T06:05:26ZByeongjee KangHarsh DesaiLimin JiaBrandon Luciahttp://arxiv.org/abs/2506.15875v1A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures2025-06-18T20:44:10ZWe have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like the Wafer Scale Engine. Additionally, MACH can execute code on traditional unified-memory devices. MACH addresses the complexities in compiling for spatial architectures through a conceptual Virtual Machine, a flexible domain-specific language, and a compiler that can lower high-level languages to machine-specific code in compliance with the Virtual Machine concept. While MACH is designed to be operable on several architectures and provide the flexibility for several standard and user-defined data mappings, we introduce the concept with dense tensor examples from NumPy and show lowering to the Wafer Scale Engine by targeting Cerebras' hardware specific languages.2025-06-18T20:44:10Z26 pages, 5 figures, 14 listingsDirk Van EssendelftPatrick WingoTerry JordanRyan SmithWissam Saidihttp://arxiv.org/abs/2506.15424v1PSM: Policy Synchronised Deterministic Memory2025-06-18T12:52:54ZConcurrency and determinacy do not go well with each other when resources must be shared. Haskell provides parallel programming abstractions such as IVar and LVar in the Par monad and concurrent abstractions such as MVar and TVar in the in IO and STM monads, respectively. The former are determinate but have no destructive updates and the latter have destructive updates but do not guarantee determinacy. Programming patterns that are both concurrent and determinate, such as those provided by Kahn or Berry require memory abstractions at a higher level than is currently available. In this paper we describe a new type context PSM for policy synchronised memory in Haskell. Like STM and IO, the computations in PSM can access persistent state and, as a side-effect, update the memory in imperative style. Like the Par and IO monads, PSM supports concurrent threads and shared state. However, in contrast to IO, our PSM contexts are race-free since concurrent accesses are policy coordinated which guarantees determinacy.Well-typed transactions in the PSM context can accommodate abstract data structures that are imperative, concurrently shareable and still behave deterministically, by construction.2025-06-18T12:52:54ZThis report summarises work on coding the theory of policy-synchronised memory (see https://rdcu.be/erBwl) in Haskell. This was developed for a graduate level course on Functional Reactive Programming taught at Bamberg University by the first author during 2020-2023. An early version of the PSM library had been presented at the SYNCHRON Workshop (Aussois, France), November 2019Michael MendlerMarc Pouzet10.1007/978-3-319-89884-1_4http://arxiv.org/abs/2502.04905v2RacerF: Lightweight Static Data Race Detection for C Code2025-06-18T11:49:40ZWe present a novel static analysis for thread-modular data race detection. Our approach exploits static analysis of sequential program behaviour whose results are generalised for multi-threaded programs using a combination of lightweight under- and over-approximating methods. We have implemented this approach in a new tool called RacerF as a plugin of the Frama-C platform. RacerF can leverage several analysis backends, most notably the Frama-C's abstract interpreter EVA. Although our methods are mostly heuristic without providing formal guarantees, our experimental evaluation shows that even for intricate programs, RacerF can provide very precise results competitive with more heavy-weight approaches while being faster than them.2025-02-07T13:21:43Z17 pages, accepted to ECOOP'25Tomáš DacíkTomáš Vojnarhttp://arxiv.org/abs/2506.15174v1A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs2025-06-18T06:41:35ZSparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs are a common platform for machine learning practitioners, but running compact data structures on these devices often leads to slow-downs due to inefficient use of computing and memory resources. This paper proposes a new compiler transformation, enumerate-and-sparse-coarsen, that accelerates sparse matrix-matrix multiplication (SPMM) on GPU devices. The transformation increases data reuse in registers and caches while creating more balanced workloads for GPU computing resources. The transformation is tested on sparse neural networks in convolutional and transformer models. On an A100 GPU and across a columns of matrix B (bCols) in $ A \times B = C$ from range of 32 to 128, the transformation yields a geometric mean speedup of 1.84$\times$ to 2.27$\times$ compared to cuBLAS and cuSPARSE baselines, respectively.2025-06-18T06:41:35ZHossein AlbakriKazem Cheshmihttp://arxiv.org/abs/2506.18923v1Mix-of-Language-Experts Architecture for Multilingual Programming2025-06-18T06:20:51ZLarge language models (LLMs) have demonstrated impressive capabilities in aiding developers with tasks like code comprehension, generation, and translation. Supporting multilingual programming -- i.e., coding tasks across multiple programming languages -- typically requires either (1) finetuning a single LLM across all programming languages, which is cost-efficient but sacrifices language-specific specialization and performance, or (2) finetuning separate LLMs for each programming language, which allows for specialization but is computationally expensive and storage-intensive due to the duplication of parameters. This paper introduces MoLE (Mix-of-Language-Experts), a novel architecture that balances efficiency and specialization for multilingual programming. MoLE is composed of a base model, a shared LoRA (low-rank adaptation) module, and a collection of language-specific LoRA modules. These modules are jointly optimized during the finetuning process, enabling effective knowledge sharing and specialization across programming languages. During inference, MoLE automatically routes to the language-specific LoRA module corresponding to the programming language of the code token being generated. Our experiments demonstrate that MoLE achieves greater parameter efficiency compared to training separate language-specific LoRAs, while outperforming a single shared LLM finetuned for all programming languages in terms of accuracy.2025-06-18T06:20:51ZAccepted at LLM4Code @ ICSE 2025Yifan ZongYuntian DengPengyu Niehttp://arxiv.org/abs/2404.12603v2Qwerty: A Basis-Oriented Quantum Programming Language2025-06-17T17:58:05ZQuantum computers have leaped from the theoretical realm into a race to large-scale implementations. This is due to the promise of revolutionary speedups, where achieving such speedup requires designing an algorithm that harnesses the structure of a problem using quantum mechanics. Yet many quantum programming languages today require programmers to reason at a low level of physics notation and quantum gate circuitry. This presents a significant barrier to entry for programmers who have not yet built up an intuition about quantum gate semantics, and it can prove to be tedious even for those who have. In this paper, we present Qwerty, a new quantum programming language that allows programmers to manipulate qubits more expressively than gates and trace programs without bra-ket notation. Due to its novel basis type and easy interoperability with Python, Qwerty is a powerful framework for high-level quantum-classical computation.2024-04-19T03:13:43Z21 pages, 38 figures; revised syntax and examples, added program-tracing figuresAustin J. AdamsSharjeel KhanArjun S. BhamraRyan R. AbusaadaJeffrey S. YoungThomas M. Contehttp://arxiv.org/abs/2407.05674v3Controllable and Reliable Knowledge-Intensive Task-Oriented Conversational Agents with Declarative Genie Worksheets2025-06-17T17:53:30ZLarge Language Models can carry out human-like conversations in diverse settings, responding to user requests for tasks and knowledge. However, existing conversational agents implemented with LLMs often struggle with hallucination, following instructions with conditional logic, and integrating knowledge from different sources. These shortcomings compromise the agents' effectiveness, rendering them unsuitable for deployment. To address these challenges, we introduce Genie, a programmable framework for creating knowledge-intensive task-oriented conversational agents. Genie can handle involved interactions and answer complex queries. Unlike LLMs, it delivers reliable, grounded responses through advanced dialogue state management and supports controllable agent policies via its declarative specification -- Genie Worksheet. This is achieved through an algorithmic runtime system that implements the developer-supplied policy, limiting LLMs to (1) parse user input using a succinct conversational history, and (2) generate responses according to supplied context. Agents built with Genie outperform SOTA methods on complex logic dialogue datasets. We conducted a user study with 62 participants on three real-life applications: restaurant reservations with Yelp, as well as ticket submission and course enrollment for university students. Genie agents with GPT-4 Turbo outperformed the GPT-4 Turbo agents with function calling, improving goal completion rates from 21.8% to 82.8% across three real-world tasks.2024-07-08T07:17:40ZAccepted at ACL 2025Harshit JoshiShicheng LiuJames ChenRobert WeigleMonica S. Lamhttp://arxiv.org/abs/2506.14606v1Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees2025-06-17T15:06:54ZThe hardware ecosystem is rapidly evolving, with increasing interest in translating low-level programs across different instruction set architectures (ISAs) in a quick, flexible, and correct way to enhance the portability and longevity of existing code. A particularly challenging class of this transpilation problem is translating between complex- (CISC) and reduced- (RISC) hardware architectures, due to fundamental differences in instruction complexity, memory models, and execution paradigms. In this work, we introduce GG (Guaranteed Guess), an ISA-centric transpilation pipeline that combines the translation power of pre-trained large language models (LLMs) with the rigor of established software testing constructs. Our method generates candidate translations using an LLM from one ISA to another, and embeds such translations within a software-testing framework to build quantifiable confidence in the translation. We evaluate our GG approach over two diverse datasets, enforce high code coverage (>98%) across unit tests, and achieve functional/semantic correctness of 99% on HumanEval programs and 49% on BringupBench programs, respectively. Further, we compare our approach to the state-of-the-art Rosetta 2 framework on Apple Silicon, showcasing 1.73x faster runtime performance, 1.47x better energy efficiency, and 2.41x better memory usage for our transpiled code, demonstrating the effectiveness of GG for real-world CISC-to-RISC translation tasks. We will open-source our codes, data, models, and benchmarks to establish a common foundation for ISA-level code translation research.2025-06-17T15:06:54ZProject page: https://ahmedheakl.github.io/Guaranteed-Guess/Ahmed HeaklSarim HashmiChaimaa AbiCeline LeeAbdulrahman Mahmoudhttp://arxiv.org/abs/2506.11209v2A Performance Model for Warp Specialization Kernels2025-06-16T18:20:02ZThis paper presents a performance model tailored for warp specialization kernels, focusing on factors such as warp size, tilling size, input matrix size, memory bandwidth, and thread divergence. Our model offers accurate predictions of execution time by leveraging differential equations validated through simulations and experiments. The insights gained from this model not only enhance our understanding of warp specialization techniques but also have practical implications for optimizing GPU-accelerated applications through compiler optimizations, kernel parameter tuning, and algorithm design.2025-06-12T18:16:04ZZhengyang LiuVinod Groverhttp://arxiv.org/abs/2506.13383v1StacKAT: Infinite State Network Verification2025-06-16T11:47:27ZWe develop StacKAT, a network verification language featuring loops, finite state variables, nondeterminism, and - most importantly - access to a stack with accompanying push and pop operations. By viewing the variables and stack as the (parsed) headers and (to-be-parsed) contents of a network packet, StacKAT can express a wide range of network behaviors including parsing, source routing, and telemetry. These behaviors are difficult or impossible to model using existing languages like NetKAT. We develop a decision procedure for StacKAT program equivalence, based on finite automata. This decision procedure provides the theoretical basis for verifying network-wide properties and is able to provide counterexamples for inequivalent programs. Finally, we provide an axiomatization of StacKAT equivalence and establish its completeness.2025-06-16T11:47:27ZProc. ACM Program. Lang. 9, PLDI, Article 158 (June 2025)Jules JacobsNate FosterTobias KappéDexter KozenLily SaadaAlexandra SilvaJana Wagemaker10.1145/3729257http://arxiv.org/abs/2506.13820v1Structured Program Synthesis using LLMs: Results and Insights from the IPARC Challenge2025-06-15T04:33:00ZThe IPARC Challenge, inspired by ARC, provides controlled program synthesis tasks over synthetic images to evaluate automatic program construction, focusing on sequence, selection, and iteration. This set of 600 tasks has resisted automated solutions. This paper presents a structured inductive programming approach with LLMs that successfully solves tasks across all IPARC categories. The controlled nature of IPARC reveals insights into LLM-based code generation, including the importance of prior structuring, LLMs' ability to aid structuring (requiring human refinement), the need to freeze correct code, the efficiency of code reuse, and how LLM-generated code can spark human creativity. These findings suggest valuable mechanisms for human-LLM collaboration in tackling complex program synthesis.2025-06-15T04:33:00ZShraddha SuranaAshwin SrinivasanMichael Bainhttp://arxiv.org/abs/2405.00229v2Aptly: Making Mobile Apps from Natural Language2025-06-14T22:53:05ZThis paper introduces Aptly, a platform designed to democratize mobile app development, particularly for young learners. Aptly integrates a Large Language Model (LLM) with App Inventor, enabling users to create apps using their natural language. User's description is translated into a programming language that corresponds with App Inventor's visual blocks. A preliminary study with high school students demonstrated the usability and potential of the platform. Prior programming experience influenced how users interact with Aptly. Participants identified areas for improvement and expressed a shift in perspective regarding programming accessibility and AI's role in creative endeavors.2024-04-30T22:33:34Z6 pages, 4 figuresEvan W. PattonDavid Y. J. KimAshley GranquistRobin LiuArianna ScottJennet ZamanovaHarold Abelson10.1145/3706599.3720081http://arxiv.org/abs/2405.11267v2Concurrent Games over Relational Structures: The Origin of Game Comonads2025-06-14T09:01:39ZSpoiler-Duplicator games are used in finite model theory to examine the expressive power of logics. Their strategies have recently been reformulated as coKleisli maps of game comonads over relational structures, providing new results in finite model theory via categorical techniques. We present a novel framework for studying Spoiler-Duplicator games by viewing them as event structures. We introduce a first systematic method for constructing comonads for all one-sided Spoiler-Duplicator games: game comonads are now realised by adjunctions to a category of games, generically constructed from a comonad in a bicategory of game schema (called signature games). Maps of the constructed categories of games are strategies and generalise coKleisli maps of game comonads; in the case of one-sided games they are shown to coincide with suitably generalised homomorphisms. Finally, we provide characterisations of strategies on two-sided Spoiler-Duplicator games; in a common special case they coincide with spans of event structures.2024-05-18T11:34:05ZExtended version of the paper in Logic in Computer Science (LICS) 2024 ProceedingsYoàv MontacuteGlynn Winskel