https://arxiv.org/api/ml9Ykilyhm2wgoUle6FsGkwZ3II
2026-06-26T05:13:51Z
9951
1170
15
http://arxiv.org/abs/2509.20518v1
Enhancing Python Programming Education with an AI-Powered Code Helper: Design, Implementation, and Impact
2025-09-24T19:43:46Z
This is the study that presents an AI-Python-based chatbot that helps students to learn programming by demonstrating solutions to such problems as debugging errors, solving syntax problems or converting abstract theoretical concepts to practical implementations. Traditional coding tools like Integrated Development Environments (IDEs) and static analyzers do not give robotic help while AI-driven code assistants such as GitHub Copilot focus on getting things done. To close this gap, our chatbot combines static code analysis, dynamic execution tracing, and large language models (LLMs) to provide the students with relevant and practical advice, hence promoting the learning process. The chatbots hybrid architecture employs CodeLlama for code embedding, GPT-4 for natural language interactions, and Docker-based sandboxing for secure execution. Evaluated through a mixed-methods approach involving 1,500 student submissions, the system demonstrated an 85% error resolution success rate, outperforming standalone tools like pylint (62%) and GPT-4 (73%). Quantitative results revealed a 59.3% reduction in debugging time among users, with pre- and post-test assessments showing a 34% improvement in coding proficiency, particularly in recursion and exception handling. Qualitative feedback from 120 students highlighted the chatbots clarity, accessibility, and confidence-building impact, though critiques included occasional latency and restrictive code sanitization. By balancing technical innovation with pedagogical empathy, this research provides a blueprint for AI tools that prioritize educational equity and long-term skill retention over mere code completion. The chatbot exemplifies how AI can augment human instruction, fostering deeper conceptual understanding in programming education.
2025-09-24T19:43:46Z
20 pages, 16 figures
Sayed Mahbub Hasan Amiri
Md Mainul Islam
http://arxiv.org/abs/2509.20426v1
Dual-Language General-Purpose Self-Hosted Visual Language and new Textual Programming Language for Applications
2025-09-24T17:43:55Z
Most visual programming languages (VPLs) are domain-specific, with few general-purpose VPLs like Programming Without Coding Technology (PWCT). These general-purpose VPLs are developed using textual programming languages and improving them requires textual programming. In this thesis, we designed and developed PWCT2, a dual-language (Arabic/English), general-purpose, self-hosting visual programming language. Before doing so, we specifically designed a textual programming language called Ring for its development. Ring is a dynamically typed language with a lightweight implementation, offering syntax customization features. It permits the creation of domain-specific languages through new features that extend object-oriented programming, allowing for specialized languages resembling Cascading Style Sheets (CSS) or Supernova language. The Ring Compiler and Virtual Machine are designed using the PWCT visual programming language where the visual implementation is composed of 18,945 components that generate 24,743 lines of C code, which increases the abstraction level and hides unnecessary details. Using PWCT to develop Ring allowed us to realize several issues in PWCT, which led to the development of the PWCT2 visual programming language using the Ring textual programming language. PWCT2 provides approximately 36 times faster code generation and requires 20 times less storage for visual source files. It also allows for the conversion of Ring code into visual code, enabling the creation of a self-hosting VPL that can be developed using itself. PWCT2 consists of approximately 92,000 lines of Ring code and comes with 394 visual components. PWCT2 is distributed to many users through the Steam platform and has received positive feedback, On Steam, 1772 users have launched the software, and the total recorded usage time exceeds 17,000 hours, encouraging further research and development.
2025-09-24T17:43:55Z
PhD thesis
Mahmoud Samir Fayed
http://arxiv.org/abs/2509.15754v2
Hornet Node and the Hornet DSL: A Minimal, Executable Specification for Bitcoin Consensus
2025-09-24T04:04:17Z
Bitcoin's consensus rules are encoded in the implementation of its reference client: "The code is the spec." Yet this code is unsuitable for formal verification due to side effects, mutable state, concurrency, and legacy design. A standalone formal specification would enable verification both across versions of the reference client and against new client implementations, strengthening decentralization by reducing the risk of consensus-splitting bugs. Yet such a specification has long been considered intractable given the complexity of Bitcoin's consensus logic. We demonstrate a compact, executable, declarative C++ specification of Bitcoin consensus rules that syncs mainnet to tip in a few hours on a single thread. We also introduce the Hornet Domain-Specific Language (DSL) specifically designed to encode these rules unambiguously for execution, enabling formal reasoning, consensus code generation, and AI-driven adversarial testing. Our spec-driven client Hornet Node offers a modern and modular complement to the reference client. Its clear, idiomatic style makes it suitable for education, while its performance makes it ideal for experimentation. We highlight architectural contributions such as its layered design, efficient data structures, and strong separation of concerns, supported by production-quality code examples. We argue that Hornet Node and Hornet DSL together provide the first credible path toward a pure, formal, executable specification of Bitcoin consensus.
2025-09-19T08:30:27Z
Toby Sharp
http://arxiv.org/abs/2509.19613v1
Compilation as Multi-Language Semantics
2025-09-23T22:05:36Z
Modeling interoperability between programs in different languages is a key problem when modeling verified and secure compilation, which has been successfully addressed using multi-language semantics. Unfortunately, existing models of compilation using multi-language semantics define two variants of each compiler pass: a syntactic translation on open terms to model compilation, and a run-time translation of closed terms at multi-language boundaries to model interoperability.
In this talk, I discuss work-in-progress approach to uniformly model a compiler entirely as a reduction system on open term in a multi-language semantics, rather than as a syntactic translation. This simultaneously defines the compiler and the interoperability semantics, reducing duplication. It also provides interesting semantic insights. Normalization of the cross-language redexes performs ahead-of-time (AOT) compilation. Evaluation in the multi-language models just-in-time (JIT) compilation. Confluence of multi-language reduction implies compiler correctness, and part of the secure compilation proof (full abstraction), enabling focus on the difficult part of the proof. Subject reduction of the multi-language reduction implies type-preservation of the compiler.
2025-09-23T22:05:36Z
William J. Bowman
http://arxiv.org/abs/2509.19607v1
Macro-embedding Compiler Intermediate Languages in Racket
2025-09-23T21:53:47Z
We present the design and implementation of a macro-embedding of a family of compiler intermediate languages, from a Scheme-like language to x86-64, into Racket. This embedding is used as part of a testing framework for a compilers course to derive interpreters for all the intermediate languages. The embedding implements features including safe, functional abstractions as well as unsafe assembly features, and the interactions between the two at various intermediate stages.
This paper aims to demonstrate language-oriented techniques and abstractions for implementing (1) a large family of languages and (2) interoperability between low- and high-level languages. The primary strength of this approach is the high degree of code reuse and interoperability compared to implementing each interpreter separately. The design emphasizes modularity and compositionality of an open set of language features by local macro expansion into a single host language, rather than implementing a language pre-defined by a closed set of features. This enables reuse from both the host language (Racket) and between intermediate languages, and enables interoperability between high- and low-level features, simplifying development of the intermediate language semantics. It also facilitates extending or redefining individual language features in intermediate languages, and exposing multiple interfaces to the embedded languages.
2025-09-23T21:53:47Z
William J. Bowman
http://arxiv.org/abs/2509.19459v1
Automated Insertion of Flushes and Fences for Persistency
2025-09-23T18:14:21Z
CXL shared memory and persistent memory allow the contents of memory to persist beyond crashes. Stores to persistent or CXL memory are typically not immediately made persistent; developers must manually flush the corresponding cache lines to force the data to be written to the underlying storage. Correctly using flush and fence operations is known to be challenging. While state-of-the-art tools can find missing flush instructions, they often require bug-revealing test cases. No existing tools can ensure the absence of missing flush bugs.
In this paper, we present PMRobust, a compiler that automatically inserts flush and fence operations to ensure that code using persistent memory is free from missing flush and fence bugs. PMRobust employs a novel static analysis with optimizations that target newly allocated objects. We have evaluated PMRobust on persistent memory libraries and several persistent memory data structures and measured a geometric mean overhead of 0.26% relative to the original benchmarks with hand-placed flush and fence operations.
2025-09-23T18:14:21Z
Yutong Guo
Weiyu Luo
Brian Demsky
http://arxiv.org/abs/2212.02754v3
Petri Nets-based Methods on Automatically Detecting for Concurrency Bugs in Rust Programs
2025-09-23T00:55:32Z
Rust's memory safety guarantees, notably ownership and lifetime systems, have driven its widespread adoption. Concurrency bugs still occur in Rust programs, and existing detection approaches exhibit significant limitations: static analyzers suffer from context insensitivity and high false positives, while dynamic methods incur prohibitive runtime costs due to exponential path exploration. This paper presents a Petri net-based method for efficient, precise detection of Rust concurrency bugs. The method rests on three pillars: (1) A syntax-preserving program-to-Petri-net transformation tailored for target bug classes; (2) Semantics-preserving state compression via context-aware slicing; (3) Bug detection through efficient Petri net reachability analysis.
The core innovation is its rigorous, control-flow-driven modeling of Rust's ownership semantics and synchronization primitives within the Petri net structure, with data operations represented as token movements. Integrated pointer analysis automates alias identification during transformation. Experiments on standard Rust concurrency benchmarks demonstrate that our method outperforms the state-of-the-art methods LockBud and Miri that are both tools of detecting concurrency bugs of Rust programs. Compared to LockBud, our approach reduces false positives by 35.7\% and false negatives by 28.3\% , which is obtained through our precise flow-sensitive pointer analysis. Compared with Miri that is a dynamic analysis tool, although Miri can obtain the same detection results, our method achieves 100% faster verification speed since our method takes a state reduce algorithm.
2022-12-06T05:11:21Z
Kaiwen Zhang
Guanjun Liu
http://arxiv.org/abs/2412.12578v2
Enabling the Verification and Formalization of Hybrid Quantum-Classical Computing with OpenQASM 3.0 compatible QASM-TS 2.0
2025-09-22T21:15:13Z
The unique features of the hybrid quantum-classical computing model implied by the specification of OpenQASM 3.0 motivate new approaches to quantum program verification. We implement and thoroughly test a QASM 3.0 parser in TypeScript to enable implementations of verification and validation software, compilers, and more. We aim to help the community to formalize the logic of hybrid quantum-classical computing by providing tools that may help with such efforts.
2024-12-17T06:13:31Z
Accepted version
Journal of Open Source Software (2025), 10(113), 8696
Sean Kim
Marcus Edwards
10.21105/joss.08696
http://arxiv.org/abs/2504.10369v2
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning
2025-09-22T15:39:22Z
Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM)-based methods have emerged as a promising alternative to address these challenges. However, LLM-based approaches often face difficulties in ensuring alignment between the generated code and the provided prompts. This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework that seamlessly integrates LLM-based code rewriting with symbolic reasoning techniques. Our method incorporates a retrieval-augmented generation (RAG) system of optimization rules and Abstract Syntax Tree (AST)-based templates, enabling LLM-based rewriting that maintains syntactic correctness while minimizing undesired circuit behaviors. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic, allowing fine-grained state merging and partial specification handling beyond the scope of pattern-based compilers. Furthermore, a fast verification pipeline, combining formal equivalence checks with test-driven validation, further reduces the complexity of verification. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively, compared to the state-of-the-art methods.
2025-04-14T16:15:55Z
NeurIPS 2025
Yiting Wang
Wanghao Ye
Ping Guo
Yexiao He
Ziyao Wang
Bowei Tian
Shwai He
Guoheng Sun
Zheyu Shen
Sihan Chen
Ankur Srivastava
Qingfu Zhang
Gang Qu
Ang Li
http://arxiv.org/abs/2509.17795v1
Efficient Linearizability Monitoring
2025-09-22T13:52:06Z
This paper revisits the fundamental problem of monitoring the linearizability of concurrent stacks, queues, sets, and multisets. Given a history of a library implementing one of these abstract data types, the monitoring problem is to answer whether the given history is linearizable. For stacks, queues, and (multi)sets, we present monitoring algorithms with complexities $\mathcal{O}(n^2)$, $\mathcal{O}(n\; log\, n)$, and $\mathcal{O}{(n)}$, respectively, where $n$ is the number of operations in the input history. For stacks and queues, our results hold under the standard assumption of {\it data-independence}, i.e., the behavior of the library is not sensitive to the actual values stored in the data structure. Past works to solve the same problems have cubic time complexity and (more seriously) have correctness issues: they either (i) lack correctness proofs or (ii) the suggested correctness proofs are erroneous (we present counter-examples), or (iii) have incorrect algorithms. Our improved complexity results rely on substantially different algorithms for which we provide detailed proofs of correctness. We have implemented our stack and queue algorithms in LiMo (Linearizability Monitor). We evaluate LiMo and compare it with the state-of-the-art tool Violin -- whose correctness proofs we have found errors in -- which checks for linearizability violations. Our experimental evaluation confirms that LiMo outperforms Violin regarding both efficiency and scalability.
2025-09-22T13:52:06Z
Parosh Aziz Abdulla
Samuel Grahn
Bengt Jonsson
Shankaranarayanan Krishna
Om Swostik Mishra
http://arxiv.org/abs/2509.20384v1
R1-Fuzz: Specializing Language Models for Textual Fuzzing via Reinforcement Learning
2025-09-21T15:21:43Z
Fuzzing is effective for vulnerability discovery but struggles with complex targets such as compilers, interpreters, and database engines, which accept textual input that must satisfy intricate syntactic and semantic constraints. Although language models (LMs) have attracted interest for this task due to their vast latent knowledge and reasoning potential, their practical adoption has been limited. The major challenges stem from insufficient exploration of deep program logic among real-world codebases, and the high cost of leveraging larger models. To overcome these challenges, we propose R1-Fuzz, the first framework that leverages reinforcement learning (RL) to specialize cost-efficient LMs and integrate them for complex textual fuzzing input generation. R1-Fuzz introduces two key designs: coverage-slicing-based question construction and a distance-based reward calculation. Through RL-based post-training of a model with our constructed dataset, R1-Fuzz designs a fuzzing workflow that tightly integrates LMs to reason deep program semantics during fuzzing. Evaluations on diverse real-world targets show that our design enables a small model, named R1-Fuzz-7B, to rival or even outperform much larger models in real-world fuzzing. Notably, R1-Fuzz achieves up to 75\% higher coverage than state-of-the-art fuzzers and discovers 29 previously unknown vulnerabilities, demonstrating its practicality.
2025-09-21T15:21:43Z
Jiayi Lin
Liangcai Su
Junzhe Li
Chenxiong Qian
http://arxiv.org/abs/2509.16497v1
PrediPrune: Reducing Verification Overhead in Souper with Machine Learning Driven Pruning
2025-09-20T02:00:49Z
Souper is a powerful enumerative superoptimizer that enhances the runtime performance of programs by optimizing LLVM intermediate representation (IR) code. However, its verification process, which relies on a computationally expensive SMT solver to validate optimization candidates, must explore a large search space. This large search space makes the verification process particularly expensive, increasing the burden to incorporate Souper into compilation tools. We propose PrediPrune, a stochastic candidate pruning strategy that effectively reduces the number of invalid candidates passed to the SMT solver. By utilizing machine learning techniques to predict the validity of candidates based on features extracted from the code, PrediPrune prunes unlikely candidates early, decreasing the verification workload. When combined with the state-of-the-art approach (Dataflow), PrediPrune decreases compilation time by 51% compared to the Baseline and by 12% compared to using only Dataflow, emphasizing the effectiveness of the combined approach that integrates a purely ML-based method (PrediPrune) with a purely non-ML based (Dataflow) method. Additionally, PrediPrune offers a flexible interface to trade-off compilation time and optimization opportunities, allowing end users to adjust the balance according to their needs.
2025-09-20T02:00:49Z
Ange-Thierry Ishimwe
Raghuveer Shivakumar
Heewoo Kim
Tamara Lehman
Joseph Izraelevitz
http://arxiv.org/abs/2502.12466v3
EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking
2025-09-19T22:16:55Z
As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e., determining whether two programs produce identical outputs for all possible inputs. Unlike prior code generation benchmarks, this task directly tests a model's ability to reason about program semantics. EquiBench consists of 2400 program pairs across four languages and six categories. These pairs are generated through program analysis, compiler scheduling, and superoptimization, ensuring high-confidence labels, nontrivial difficulty, and full automation. We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline. Further analysis reveals that models often rely on syntactic similarity rather than exhibiting robust reasoning about program semantics, highlighting current limitations. Our code and dataset are publicly available at https://github.com/Anjiang-Wei/equibench
2025-02-18T02:54:25Z
Anjiang Wei
Jiannan Cao
Ran Li
Hongyu Chen
Yuhui Zhang
Ziheng Wang
Yuan Liu
Thiago S. F. X. Teixeira
Diyi Yang
Ke Wang
Alex Aiken
http://arxiv.org/abs/2509.16443v1
LightCode: Compiling LLM Inference for Photonic-Electronic Systems
2025-09-19T21:45:26Z
The growing demand for low-latency, energy-efficient inference in large language models (LLMs) has catalyzed interest in heterogeneous architectures. While GPUs remain dominant, they are poorly suited for integration with emerging domain-specific accelerators like the Photonic Tensor Units (PTUs), which offer low-power, high-throughput linear computation. This motivates hybrid compilation strategies that combine photonic and electronic resources. We present LightCode, a compiler framework and simulator for mapping LLM inference workloads across hybrid photonic-electronic systems. LightCode introduces the Stacked Graph, an intermediate representation that encodes multiple hardware-specific realizations of each tensor operation. Hardware assignment is formulated as a constrained subgraph selection problem optimized for latency or energy under parametric cost models. We evaluate LightCode on the prefill stage of GPT-2 and Llama-7B showing that under our workload and hardware assumptions, (i) Photonic hardware reduced energy by up to 50% in our simulated workloads at maximum sequence length; (ii) multiplexing and assignment strategy yielded latency improvements exceeding 10x; and (iii) Optimizing for latency or energy resulted in distinct hardware mappings in our simulations. LightCode offers a module, foundational framework and simulator for compiling LLMs to emerging photonic accelerators.
2025-09-19T21:45:26Z
9 pages, 8 figures
Ryan Tomich
Zhizhen Zhong
Dirk Englund
http://arxiv.org/abs/2509.15834v1
Automatic layout of railroad diagrams
2025-09-19T10:13:03Z
Railroad diagrams (also called "syntax diagrams") are a common, intuitive visualization of grammars, but limited tooling and a lack of formal attention to their layout mostly confines them to hand-drawn documentation. We present the first formal treatment of railroad diagram layout along with a principled, practical implementation. We characterize the problem as compiling a *diagram language* (specifying conceptual components and how they connect and compose) to a *layout language* (specifying basic graphical shapes and their sizes and positions). We then implement a compiler that performs *line wrapping* to meet a target width, as well as vertical *alignment* and horizontal *justification* per user-specified policies. We frame line wrapping as an optimization problem, where we describe principled dimensions of optimality and implement corresponding heuristics. For front-end evaluation, we show that our diagram language is well-suited for common applications by describing how regular expressions and Backus-Naur form can be compiled to it. For back-end evaluation, we argue that our compiler is practical by comparing its output to diagrams laid out by hand and by other tools.
2025-09-19T10:13:03Z
24 pages (+2 appendix, +3 references); 22 figures (+4 appendix); 3 tables
Shardul Chiplunkar
Clément Pit-Claudel