https://arxiv.org/api/ml9Ykilyhm2wgoUle6FsGkwZ3II 2026-06-26T05:13:51Z 9951 1170 15 http://arxiv.org/abs/2509.20518v1 Enhancing Python Programming Education with an AI-Powered Code Helper: Design, Implementation, and Impact 2025-09-24T19:43:46Z

This is the study that presents an AI-Python-based chatbot that helps students to learn programming by demonstrating solutions to such problems as debugging errors, solving syntax problems or converting abstract theoretical concepts to practical implementations. Traditional coding tools like Integrated Development Environments (IDEs) and static analyzers do not give robotic help while AI-driven code assistants such as GitHub Copilot focus on getting things done. To close this gap, our chatbot combines static code analysis, dynamic execution tracing, and large language models (LLMs) to provide the students with relevant and practical advice, hence promoting the learning process. The chatbots hybrid architecture employs CodeLlama for code embedding, GPT-4 for natural language interactions, and Docker-based sandboxing for secure execution. Evaluated through a mixed-methods approach involving 1,500 student submissions, the system demonstrated an 85% error resolution success rate, outperforming standalone tools like pylint (62%) and GPT-4 (73%). Quantitative results revealed a 59.3% reduction in debugging time among users, with pre- and post-test assessments showing a 34% improvement in coding proficiency, particularly in recursion and exception handling. Qualitative feedback from 120 students highlighted the chatbots clarity, accessibility, and confidence-building impact, though critiques included occasional latency and restrictive code sanitization. By balancing technical innovation with pedagogical empathy, this research provides a blueprint for AI tools that prioritize educational equity and long-term skill retention over mere code completion. The chatbot exemplifies how AI can augment human instruction, fostering deeper conceptual understanding in programming education.

2025-09-24T19:43:46Z 20 pages, 16 figures Sayed Mahbub Hasan Amiri Md Mainul Islam http://arxiv.org/abs/2509.20426v1 Dual-Language General-Purpose Self-Hosted Visual Language and new Textual Programming Language for Applications 2025-09-24T17:43:55Z

Most visual programming languages (VPLs) are domain-specific, with few general-purpose VPLs like Programming Without Coding Technology (PWCT). These general-purpose VPLs are developed using textual programming languages and improving them requires textual programming. In this thesis, we designed and developed PWCT2, a dual-language (Arabic/English), general-purpose, self-hosting visual programming language. Before doing so, we specifically designed a textual programming language called Ring for its development. Ring is a dynamically typed language with a lightweight implementation, offering syntax customization features. It permits the creation of domain-specific languages through new features that extend object-oriented programming, allowing for specialized languages resembling Cascading Style Sheets (CSS) or Supernova language. The Ring Compiler and Virtual Machine are designed using the PWCT visual programming language where the visual implementation is composed of 18,945 components that generate 24,743 lines of C code, which increases the abstraction level and hides unnecessary details. Using PWCT to develop Ring allowed us to realize several issues in PWCT, which led to the development of the PWCT2 visual programming language using the Ring textual programming language. PWCT2 provides approximately 36 times faster code generation and requires 20 times less storage for visual source files. It also allows for the conversion of Ring code into visual code, enabling the creation of a self-hosting VPL that can be developed using itself. PWCT2 consists of approximately 92,000 lines of Ring code and comes with 394 visual components. PWCT2 is distributed to many users through the Steam platform and has received positive feedback, On Steam, 1772 users have launched the software, and the total recorded usage time exceeds 17,000 hours, encouraging further research and development.

2025-09-24T17:43:55Z PhD thesis Mahmoud Samir Fayed http://arxiv.org/abs/2509.15754v2 Hornet Node and the Hornet DSL: A Minimal, Executable Specification for Bitcoin Consensus 2025-09-24T04:04:17Z

Bitcoin's consensus rules are encoded in the implementation of its reference client: "The code is the spec." Yet this code is unsuitable for formal verification due to side effects, mutable state, concurrency, and legacy design. A standalone formal specification would enable verification both across versions of the reference client and against new client implementations, strengthening decentralization by reducing the risk of consensus-splitting bugs. Yet such a specification has long been considered intractable given the complexity of Bitcoin's consensus logic. We demonstrate a compact, executable, declarative C++ specification of Bitcoin consensus rules that syncs mainnet to tip in a few hours on a single thread. We also introduce the Hornet Domain-Specific Language (DSL) specifically designed to encode these rules unambiguously for execution, enabling formal reasoning, consensus code generation, and AI-driven adversarial testing. Our spec-driven client Hornet Node offers a modern and modular complement to the reference client. Its clear, idiomatic style makes it suitable for education, while its performance makes it ideal for experimentation. We highlight architectural contributions such as its layered design, efficient data structures, and strong separation of concerns, supported by production-quality code examples. We argue that Hornet Node and Hornet DSL together provide the first credible path toward a pure, formal, executable specification of Bitcoin consensus.

2025-09-19T08:30:27Z Toby Sharp http://arxiv.org/abs/2509.19613v1 Compilation as Multi-Language Semantics 2025-09-23T22:05:36Z

Modeling interoperability between programs in different languages is a key problem when modeling verified and secure compilation, which has been successfully addressed using multi-language semantics. Unfortunately, existing models of compilation using multi-language semantics define two variants of each compiler pass: a syntactic translation on open terms to model compilation, and a run-time translation of closed terms at multi-language boundaries to model interoperability. In this talk, I discuss work-in-progress approach to uniformly model a compiler entirely as a reduction system on open term in a multi-language semantics, rather than as a syntactic translation. This simultaneously defines the compiler and the interoperability semantics, reducing duplication. It also provides interesting semantic insights. Normalization of the cross-language redexes performs ahead-of-time (AOT) compilation. Evaluation in the multi-language models just-in-time (JIT) compilation. Confluence of multi-language reduction implies compiler correctness, and part of the secure compilation proof (full abstraction), enabling focus on the difficult part of the proof. Subject reduction of the multi-language reduction implies type-preservation of the compiler.

2025-09-23T22:05:36Z William J. Bowman http://arxiv.org/abs/2509.19607v1 Macro-embedding Compiler Intermediate Languages in Racket 2025-09-23T21:53:47Z

We present the design and implementation of a macro-embedding of a family of compiler intermediate languages, from a Scheme-like language to x86-64, into Racket. This embedding is used as part of a testing framework for a compilers course to derive interpreters for all the intermediate languages. The embedding implements features including safe, functional abstractions as well as unsafe assembly features, and the interactions between the two at various intermediate stages. This paper aims to demonstrate language-oriented techniques and abstractions for implementing (1) a large family of languages and (2) interoperability between low- and high-level languages. The primary strength of this approach is the high degree of code reuse and interoperability compared to implementing each interpreter separately. The design emphasizes modularity and compositionality of an open set of language features by local macro expansion into a single host language, rather than implementing a language pre-defined by a closed set of features. This enables reuse from both the host language (Racket) and between intermediate languages, and enables interoperability between high- and low-level features, simplifying development of the intermediate language semantics. It also facilitates extending or redefining individual language features in intermediate languages, and exposing multiple interfaces to the embedded languages.

2025-09-23T21:53:47Z William J. Bowman http://arxiv.org/abs/2509.19459v1 Automated Insertion of Flushes and Fences for Persistency 2025-09-23T18:14:21Z

CXL shared memory and persistent memory allow the contents of memory to persist beyond crashes. Stores to persistent or CXL memory are typically not immediately made persistent; developers must manually flush the corresponding cache lines to force the data to be written to the underlying storage. Correctly using flush and fence operations is known to be challenging. While state-of-the-art tools can find missing flush instructions, they often require bug-revealing test cases. No existing tools can ensure the absence of missing flush bugs. In this paper, we present PMRobust, a compiler that automatically inserts flush and fence operations to ensure that code using persistent memory is free from missing flush and fence bugs. PMRobust employs a novel static analysis with optimizations that target newly allocated objects. We have evaluated PMRobust on persistent memory libraries and several persistent memory data structures and measured a geometric mean overhead of 0.26% relative to the original benchmarks with hand-placed flush and fence operations.

2025-09-23T18:14:21Z Yutong Guo Weiyu Luo Brian Demsky http://arxiv.org/abs/2212.02754v3 Petri Nets-based Methods on Automatically Detecting for Concurrency Bugs in Rust Programs 2025-09-23T00:55:32Z

Rust's memory safety guarantees, notably ownership and lifetime systems, have driven its widespread adoption. Concurrency bugs still occur in Rust programs, and existing detection approaches exhibit significant limitations: static analyzers suffer from context insensitivity and high false positives, while dynamic methods incur prohibitive runtime costs due to exponential path exploration. This paper presents a Petri net-based method for efficient, precise detection of Rust concurrency bugs. The method rests on three pillars: (1) A syntax-preserving program-to-Petri-net transformation tailored for target bug classes; (2) Semantics-preserving state compression via context-aware slicing; (3) Bug detection through efficient Petri net reachability analysis. The core innovation is its rigorous, control-flow-driven modeling of Rust's ownership semantics and synchronization primitives within the Petri net structure, with data operations represented as token movements. Integrated pointer analysis automates alias identification during transformation. Experiments on standard Rust concurrency benchmarks demonstrate that our method outperforms the state-of-the-art methods LockBud and Miri that are both tools of detecting concurrency bugs of Rust programs. Compared to LockBud, our approach reduces false positives by 35.7\% and false negatives by 28.3\% , which is obtained through our precise flow-sensitive pointer analysis. Compared with Miri that is a dynamic analysis tool, although Miri can obtain the same detection results, our method achieves 100% faster verification speed since our method takes a state reduce algorithm.

2022-12-06T05:11:21Z Kaiwen Zhang Guanjun Liu http://arxiv.org/abs/2412.12578v2 Enabling the Verification and Formalization of Hybrid Quantum-Classical Computing with OpenQASM 3.0 compatible QASM-TS 2.0 2025-09-22T21:15:13Z

The unique features of the hybrid quantum-classical computing model implied by the specification of OpenQASM 3.0 motivate new approaches to quantum program verification. We implement and thoroughly test a QASM 3.0 parser in TypeScript to enable implementations of verification and validation software, compilers, and more. We aim to help the community to formalize the logic of hybrid quantum-classical computing by providing tools that may help with such efforts.

2024-12-17T06:13:31Z Accepted version Journal of Open Source Software (2025), 10(113), 8696 Sean Kim Marcus Edwards 10.21105/joss.08696 http://arxiv.org/abs/2504.10369v2 SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning 2025-09-22T15:39:22Z

Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM)-based methods have emerged as a promising alternative to address these challenges. However, LLM-based approaches often face difficulties in ensuring alignment between the generated code and the provided prompts. This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework that seamlessly integrates LLM-based code rewriting with symbolic reasoning techniques. Our method incorporates a retrieval-augmented generation (RAG) system of optimization rules and Abstract Syntax Tree (AST)-based templates, enabling LLM-based rewriting that maintains syntactic correctness while minimizing undesired circuit behaviors. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic, allowing fine-grained state merging and partial specification handling beyond the scope of pattern-based compilers. Furthermore, a fast verification pipeline, combining formal equivalence checks with test-driven validation, further reduces the complexity of verification. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively, compared to the state-of-the-art methods.

2025-04-14T16:15:55Z NeurIPS 2025 Yiting Wang Wanghao Ye Ping Guo Yexiao He Ziyao Wang Bowei Tian Shwai He Guoheng Sun Zheyu Shen Sihan Chen Ankur Srivastava Qingfu Zhang Gang Qu Ang Li http://arxiv.org/abs/2509.17795v1 Efficient Linearizability Monitoring 2025-09-22T13:52:06Z

This paper revisits the fundamental problem of monitoring the linearizability of concurrent stacks, queues, sets, and multisets. Given a history of a library implementing one of these abstract data types, the monitoring problem is to answer whether the given history is linearizable. For stacks, queues, and (multi)sets, we present monitoring algorithms with complexities $\mathcal{O}(n^2)$, $\mathcal{O}(n\; log\, n)$, and $\mathcal{O}{(n)}$, respectively, where $n$ is the number of operations in the input history. For stacks and queues, our results hold under the standard assumption of {\it data-independence}, i.e., the behavior of the library is not sensitive to the actual values stored in the data structure. Past works to solve the same problems have cubic time complexity and (more seriously) have correctness issues: they either (i) lack correctness proofs or (ii) the suggested correctness proofs are erroneous (we present counter-examples), or (iii) have incorrect algorithms. Our improved complexity results rely on substantially different algorithms for which we provide detailed proofs of correctness. We have implemented our stack and queue algorithms in LiMo (Linearizability Monitor). We evaluate LiMo and compare it with the state-of-the-art tool Violin -- whose correctness proofs we have found errors in -- which checks for linearizability violations. Our experimental evaluation confirms that LiMo outperforms Violin regarding both efficiency and scalability.

2025-09-22T13:52:06Z Parosh Aziz Abdulla Samuel Grahn Bengt Jonsson Shankaranarayanan Krishna Om Swostik Mishra http://arxiv.org/abs/2509.20384v1 R1-Fuzz: Specializing Language Models for Textual Fuzzing via Reinforcement Learning 2025-09-21T15:21:43Z

Fuzzing is effective for vulnerability discovery but struggles with complex targets such as compilers, interpreters, and database engines, which accept textual input that must satisfy intricate syntactic and semantic constraints. Although language models (LMs) have attracted interest for this task due to their vast latent knowledge and reasoning potential, their practical adoption has been limited. The major challenges stem from insufficient exploration of deep program logic among real-world codebases, and the high cost of leveraging larger models. To overcome these challenges, we propose R1-Fuzz, the first framework that leverages reinforcement learning (RL) to specialize cost-efficient LMs and integrate them for complex textual fuzzing input generation. R1-Fuzz introduces two key designs: coverage-slicing-based question construction and a distance-based reward calculation. Through RL-based post-training of a model with our constructed dataset, R1-Fuzz designs a fuzzing workflow that tightly integrates LMs to reason deep program semantics during fuzzing. Evaluations on diverse real-world targets show that our design enables a small model, named R1-Fuzz-7B, to rival or even outperform much larger models in real-world fuzzing. Notably, R1-Fuzz achieves up to 75\% higher coverage than state-of-the-art fuzzers and discovers 29 previously unknown vulnerabilities, demonstrating its practicality.

2025-09-21T15:21:43Z Jiayi Lin Liangcai Su Junzhe Li Chenxiong Qian http://arxiv.org/abs/2509.16497v1 PrediPrune: Reducing Verification Overhead in Souper with Machine Learning Driven Pruning 2025-09-20T02:00:49Z

Souper is a powerful enumerative superoptimizer that enhances the runtime performance of programs by optimizing LLVM intermediate representation (IR) code. However, its verification process, which relies on a computationally expensive SMT solver to validate optimization candidates, must explore a large search space. This large search space makes the verification process particularly expensive, increasing the burden to incorporate Souper into compilation tools. We propose PrediPrune, a stochastic candidate pruning strategy that effectively reduces the number of invalid candidates passed to the SMT solver. By utilizing machine learning techniques to predict the validity of candidates based on features extracted from the code, PrediPrune prunes unlikely candidates early, decreasing the verification workload. When combined with the state-of-the-art approach (Dataflow), PrediPrune decreases compilation time by 51% compared to the Baseline and by 12% compared to using only Dataflow, emphasizing the effectiveness of the combined approach that integrates a purely ML-based method (PrediPrune) with a purely non-ML based (Dataflow) method. Additionally, PrediPrune offers a flexible interface to trade-off compilation time and optimization opportunities, allowing end users to adjust the balance according to their needs.

2025-09-20T02:00:49Z Ange-Thierry Ishimwe Raghuveer Shivakumar Heewoo Kim Tamara Lehman Joseph Izraelevitz http://arxiv.org/abs/2502.12466v3 EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking 2025-09-19T22:16:55Z

As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e., determining whether two programs produce identical outputs for all possible inputs. Unlike prior code generation benchmarks, this task directly tests a model's ability to reason about program semantics. EquiBench consists of 2400 program pairs across four languages and six categories. These pairs are generated through program analysis, compiler scheduling, and superoptimization, ensuring high-confidence labels, nontrivial difficulty, and full automation. We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline. Further analysis reveals that models often rely on syntactic similarity rather than exhibiting robust reasoning about program semantics, highlighting current limitations. Our code and dataset are publicly available at https://github.com/Anjiang-Wei/equibench

2025-02-18T02:54:25Z Anjiang Wei Jiannan Cao Ran Li Hongyu Chen Yuhui Zhang Ziheng Wang Yuan Liu Thiago S. F. X. Teixeira Diyi Yang Ke Wang Alex Aiken http://arxiv.org/abs/2509.16443v1 LightCode: Compiling LLM Inference for Photonic-Electronic Systems 2025-09-19T21:45:26Z

The growing demand for low-latency, energy-efficient inference in large language models (LLMs) has catalyzed interest in heterogeneous architectures. While GPUs remain dominant, they are poorly suited for integration with emerging domain-specific accelerators like the Photonic Tensor Units (PTUs), which offer low-power, high-throughput linear computation. This motivates hybrid compilation strategies that combine photonic and electronic resources. We present LightCode, a compiler framework and simulator for mapping LLM inference workloads across hybrid photonic-electronic systems. LightCode introduces the Stacked Graph, an intermediate representation that encodes multiple hardware-specific realizations of each tensor operation. Hardware assignment is formulated as a constrained subgraph selection problem optimized for latency or energy under parametric cost models. We evaluate LightCode on the prefill stage of GPT-2 and Llama-7B showing that under our workload and hardware assumptions, (i) Photonic hardware reduced energy by up to 50% in our simulated workloads at maximum sequence length; (ii) multiplexing and assignment strategy yielded latency improvements exceeding 10x; and (iii) Optimizing for latency or energy resulted in distinct hardware mappings in our simulations. LightCode offers a module, foundational framework and simulator for compiling LLMs to emerging photonic accelerators.

2025-09-19T21:45:26Z 9 pages, 8 figures Ryan Tomich Zhizhen Zhong Dirk Englund http://arxiv.org/abs/2509.15834v1 Automatic layout of railroad diagrams 2025-09-19T10:13:03Z

Railroad diagrams (also called "syntax diagrams") are a common, intuitive visualization of grammars, but limited tooling and a lack of formal attention to their layout mostly confines them to hand-drawn documentation. We present the first formal treatment of railroad diagram layout along with a principled, practical implementation. We characterize the problem as compiling a *diagram language* (specifying conceptual components and how they connect and compose) to a *layout language* (specifying basic graphical shapes and their sizes and positions). We then implement a compiler that performs *line wrapping* to meet a target width, as well as vertical *alignment* and horizontal *justification* per user-specified policies. We frame line wrapping as an optimization problem, where we describe principled dimensions of optimality and implement corresponding heuristics. For front-end evaluation, we show that our diagram language is well-suited for common applications by describing how regular expressions and Backus-Naur form can be compiled to it. For back-end evaluation, we argue that our compiler is practical by comparing its output to diagrams laid out by hand and by other tools.

2025-09-19T10:13:03Z 24 pages (+2 appendix, +3 references); 22 figures (+4 appendix); 3 tables Shardul Chiplunkar Clément Pit-Claudel