https://arxiv.org/api/5hIc19UkYW6OIbyqo8ERHo/o+i4 2026-06-23T23:09:20Z 9934 1140 15 http://arxiv.org/abs/2509.22982v1 Efficient Cost Bounds with Linear Maps 2025-09-26T22:29:42Z The Automatic Amortized Resource Analysis (AARA) derives program-execution cost bounds using types. To do so, AARA often makes use of cost-free types, which are critical for the composition of types and cost bounds. However, inferring cost-free types using the current state-of-the-art algorithm is expensive due to recursive dependence on additional cost-free types. Furthermore, that algorithm uses a heuristic only applicable to polynomial cost bounds, and not, e.g., exponential bounds. This paper presents a new approach to these problems by representing the cost-free types of a function in a new way: with a linear map, which can stand for infinitely many cost-free types. Such maps enable an algebraic flavor of reasoning about cost bounds (including non-polynomial bounds) via matrix inequalities. These inequalities can be solved with off-the-shelf linear-programming tools for many programs, so that types can always be efficiently checked and often be efficiently inferred. An experimental evaluation with a prototype implementation shows that-when it is applicable-the inference of linear maps is exponentially more efficient than the state-of-the-art algorithm. 2025-09-26T22:29:42Z David M Kahn Jan Hoffmann Thomas Reps Jessie Grosen http://arxiv.org/abs/2505.00449v4 An approach for modularly verifying the core of Rust's atomic reference counting algorithm against the (Y)C20 memory consistency model 2025-09-26T21:58:19Z We propose an approach for modular verification of programs that use relaxed-consistency atomic memory access primitives and fences. The approach is sufficient for verifying the core of Rust's Atomic Reference Counting (ARC) algorithm. We first argue its soundness, when combined with a simple static analysis and admitting an open sub-problem, with respect to the C20 memory consistency model. We then argue its soundness, even in the absence of any static analysis and without any assumptions, with respect to YC20, a minor strengthening of XC20, itself a recently proposed minor strengthening of C20 that rules out out-of-thin-air behaviors but allows load buffering. In contrast to existing work on verifying ARC, we do not assume acyclicity of the union of the program-order and reads-from relations. We define an interleaving operational semantics, prove its soundness with respect to (Y)C20's axiomatic semantics, and then apply any existing program logic for fine-grained interleaving concurrency, such as Iris. 2025-05-01T10:51:20Z 13 pages, 5 figures Bart Jacobs Justus Fasse http://arxiv.org/abs/2510.15912v1 Latency Based Tiling 2025-09-26T20:54:12Z Latency Based Tiling provides a systems based approach to deriving approximate tiling solution that maximizes locality while maintaining a fast compile time. The method uses triangular loops to characterize miss ratio scaling of a machine avoiding prefetcher distortion. Miss ratio scaling captures the relationship between data access latency and working set size with sharp increases in latency indicating the data footprint exceeds capacity from a cache level. Through these noticeable increases in latency we can determine an approximate location for L1, L2, and L3 memory sizes. These sizes are expected to be under approximations of a systems true memory sizes which is in line with our expectations given the shared nature of cache in a multi process system as described in defensive loop tiling. Unlike auto tuning, which can be effective but prohibitively slow, Latency Based Tiling achieves negligible compile time overhead. The implementation in Rust enables a hardware agnostic approach which combined with a cache timing based techniques, yields a portable, memory safe system running wherever Rust is supported. The tiling strategy is applied to a subset of the polyhedral model, where loop nestings are tiled based on both the derived memory hierarchy and the observed data footprint per iteration. 2025-09-26T20:54:12Z Jack Cashman http://arxiv.org/abs/2509.22908v1 A benchmark for vericoding: formally verified program synthesis 2025-09-26T20:36:20Z We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications - in contrast to vibe coding, which generates potentially buggy code from a natural language description. Our benchmark contains 12,504 formal specifications, with 3,029 in Dafny, 2,334 in Verus/Rust and 7,141 in Lean. Of these, 6,174 are new unseen problems. We find vericoding success rates of 27% in Lean, 44% in Verus/Rust and 82% in Dafny using off-the-shelf LLMs. Adding natural-language descriptions does not significantly improve performance. We also find that LLM progress has improved progress on pure Dafny verification from 68% to 96% over the past year. The benchmark and vericoding results are shared at https://github.com/Beneficial-AI-Foundation/vericoding-benchmark 2025-09-26T20:36:20Z 25 pages, 1 figure; data available at https://github.com/Beneficial-AI-Foundation/vericoding-benchmark Sergiu Bursuc BAIF Theodore Ehrenborg BAIF Shaowei Lin BAIF Lacramioara Astefanoaei BAIF Ionel Emilian Chiosa MIT Jure Kukovec BAIF Alok Singh BAIF Oliver Butterley BAIF Adem Bizid BAIF Quinn Dougherty BAIF Miranda Zhao MIT Max Tan MIT Max Tegmark MIT http://arxiv.org/abs/2509.22614v1 Committing to the bit: Relational programming with semiring arrays and SAT solving 2025-09-26T17:40:36Z We propose semiringKanren, a relational programming language where each relation expression denotes a semiring array. We formalize a type system that restricts the arrays to finite size. We then define a semantics that is parameterized by the semiring that the arrays draw their elements from. We compile semiringKanren types to bitstring representations. For the Boolean semiring, this compilation enables us to use an SAT solver to run semiringKanren programs efficiently. We compare the performance of semiringKanren and faster miniKanren for solving Sudoku puzzles. Our experiment shows that semiringKanren can be a more efficient variant of miniKanren. 2025-09-26T17:40:36Z 12 pages, for associated repo see https://github.com/sporkl/semiringkanren Dmitri Volkov Yafei Yang Chung-chieh Shan http://arxiv.org/abs/2411.10659v8 Spineless Traversal for Layout Invalidation 2025-09-26T10:00:54Z Latency is a major concern for web rendering engines like those in Chrome, Safari, and Firefox. These engines reduce latency by using an incremental layout algorithm to redraw the page when the user interacts with it. In such an algorithm, elements that change frame-to-frame are marked dirty, and only those elements are processed to draw the next frame, dramatically reducing latency. However, the standard incremental layout algorithm must search the page for dirty elements, accessing auxiliary elements in the process. These auxiliary elements add cache misses and stalled cycles, and are responsible for a sizable fraction of all layout latency. We introduce a new, faster incremental layout algorithm called Spineless Traversal. Spineless Traversal uses a cache-friendlier priority queue algorithm that avoids accessing auxiliary nodes and thus reduces cache traffic and stalls. This leads to dramatic speedups on the most latency-critical interactions such as hovering, typing, and animation. Moreover, thanks to numerous low-level optimizations, Spineless Traversal is competitive across the whole spectrum of incremental layout workloads. Spineless Traversal is faster than the standard approach on 83.0% of 2216 benchmarks, with a mean speedup of 1.80x concentrated in the most latency-critical interactions. 2024-11-16T01:39:44Z Marisa Kirisame Tiezhi Wang Pavel Panchekha http://arxiv.org/abs/2509.21793v1 Compiling by Proving: Language-Agnostic Automatic Optimization from Formal Semantics 2025-09-26T02:49:08Z Verification proofs encode complete program behavior, yet we discard them after checking correctness. We present compiling by proving, a paradigm that transforms these proofs into optimized execution rules. By constructing All-Path Reachability Proofs through symbolic execution and compiling their graph structure, we consolidate many semantic rewrites into single rules while preserving correctness by construction. We implement this as a language-agnostic extension to the K framework. Evaluation demonstrates performance improvements across different compilation scopes: opcode-level optimizations show consistent speedups, while whole-program compilation achieves orders of magnitude greater performance gains. 2025-09-26T02:49:08Z Jianhong Zhao Everett Hildenbrandt Juan Conejero Yongwang Zhao http://arxiv.org/abs/2509.18583v3 A Verified Compiler for Quantum Simulation 2025-09-26T01:57:18Z Hamiltonian simulation is a central application of quantum computing, with significant potential in modeling physical systems and solving complex optimization problems. Existing compilers for such simulations typically focus on low-level representations based on Pauli operators, limiting programmability and offering no formal guarantees of correctness across the compilation pipeline. We introduce QBlue, a high-level, formally verified framework for compiling Hamiltonian simulations. QBlue is based on the formalism of second quantization, which provides a natural and expressive way to describe quantum particle systems using creation and annihilation operators. To ensure safety and correctness, QBlue includes a type system that tracks particle types and enforces Hermitian structure. The framework supports compilation to both digital and analog quantum circuits and captures multiple layers of semantics, from static constraints to dynamic evolution. All components of QBlue, including its language design, type system, and compilation correctness, are fully mechanized in the Rocq proof framework, making it the first end-to-end verified compiler for second-quantized Hamiltonian simulation. 2025-09-23T03:00:35Z Liyi Li Fenfen An Federico Zahariev Zhi Xiang Chong Amr Sabry Mark S. Gordon http://arxiv.org/abs/2509.20534v2 Efficient Symbolic Computation via Hash Consing 2025-09-26T01:51:02Z Symbolic computation systems suffer from memory inefficiencies due to redundant storage of structurally identical subexpressions, commonly known as expression swell, which degrades performance in both classical computer algebra and emerging AI-driven mathematical reasoning tools. In this paper, we present the first integration of hash consing into JuliaSymbolics, a high-performance symbolic toolkit in Julia, by employing a global weak-reference hash table that canonicalizes expressions and eliminates duplication. This approach reduces memory consumption and accelerates key operations such as differentiation, simplification, and code generation, while seamlessly integrating with Julia's metaprogramming and just-in-time compilation infrastructure. Benchmark evaluations across different computational domains reveal substantial improvements: symbolic computations are accelerated by up to 3.2 times, memory usage is reduced by up to 2 times, code generation is up to 5 times faster, function compilation up to 10 times faster, and numerical evaluation up to 100 times faster for larger models. While certain workloads with fewer duplicate unknown-variable expressions show more modest gains or even slight overhead in initial computation stages, downstream processing consistently benefits significantly. These findings underscore the importance of hash consing in scaling symbolic computation and pave the way for future work integrating hash consing with e-graphs for enhanced equivalence-aware expression sharing in AI-driven pipelines. 2025-09-24T20:06:56Z Bowen Zhu Aayush Sabharwal Songchen Tan Yingbo Ma Alan Edelman Christopher Rackauckas http://arxiv.org/abs/2509.20380v2 ACCeLLiuM: Supervised Fine-Tuning for Automated OpenACC Pragma Generation 2025-09-26T01:37:37Z The increasing ubiquity of GPUs is accompanied by the increasing complexity of their hardware and parallel programming frameworks. Directive-based parallel programming standards like OpenACC simplify GPU programming to some extent by abstracting away low-level complexities, but a fair amount of expertise is still required in order to use those directives effectively. We introduce ACCeLLiuM, two open weights Large Language Models specifically fine-tuned for generating expert OpenACC directives for data-parallel loops, along with the supervised fine-tuning dataset that was used to train them. The ACCeLLiuM SFT dataset contains 4,033 OpenACC pragma-loop pairs mined from public GitHub C/C++ repositories, with 3,223 pairs for training and 810 for testing. Experimental evaluations show a pronounced performance gap in generating correct OpenACC pragmas between base LLMs and our fine-tuned versions. On the held-out test set, base LLMs fail to consistently generate valid pragmas, whereas LLMs fine-tuned on the ACCeLLiuM dataset generate valid pragmas with the correct directive type for $87\%$ of the data-parallel loops, and exact pragmas--including directives, clauses, clause order, and clause variables--for $50\%$ of the cases. Even when not exact, generated pragmas frequently incorporate the correct clauses in a different order than the ground-truth label, or include additional clauses that enable finer control over parallel execution, data movement, and concurrency, offering practical value beyond strict string-matching. By publicly releasing the code, models, and dataset as ACCeLLiuM we hope to establish a reproducible benchmark for LLM-powered OpenACC pragma generation, and lower the barrier to automated GPU offloading of serially written programs. 2025-09-20T20:41:32Z Samyak Jhaveri Vanessa Klotzmann Crista Lopes http://arxiv.org/abs/2509.21039v1 Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem 2025-09-25T11:45:29Z We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLIR) compiler infrastructure, Mojo aims to close performance and productivity gaps by combining Python's interoperability and CUDA-like syntax for compile-time portable GPU programming. We target four scientific workloads: a seven-point stencil (memory-bound), BabelStream (memory-bound), miniBUDE (compute-bound), and Hartree-Fock (compute-bound with atomic operations); and compare their performance against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. We show that Mojo's performance is competitive with CUDA and HIP for memory-bound kernels, whereas gaps exist on AMD GPUs for atomic operations and for fast-math compute-bound kernels on both AMD and NVIDIA GPUs. Although the learning curve and programming requirements are still fairly low-level, Mojo can close significant gaps in the fragmented Python ecosystem in the convergence of scientific computing and AI. 2025-09-25T11:45:29Z Accepted at the IEEE/ACM SC25 Conference WACCPD Workshop. The International Conference for High Performance Computing, Networking, Storage, and Analysis, St. Louis, MO, Nov 16-21, 2025. 15 pages, 7 figures. WFG and TM contributed equally William F. Godoy Tatiana Melnichenko Pedro Valero-Lara Wael Elwasif Philip Fackler Rafael Ferreira Da Silva Keita Teranishi Jeffrey S. Vetter 10.1145/3731599.3767573 http://arxiv.org/abs/2404.08163v2 ViCAR: Visualizing Categories with Automated Rewriting in Coq 2025-09-25T09:18:08Z We present ViCAR, a library for working with monoidal categories in the Coq proof assistant. ViCAR provides definitions for categorical structures that users can instantiate with their own verification projects. Upon verifying relevant coherence conditions, ViCAR gives a set of lemmas and tactics for manipulating categorical structures. We also provide a visualizer that can display any composition and tensor product of morphisms as a string diagram, showing its categorical structure. This enables graphical reasoning and automated rewriting for Coq projects with monoidal structures. 2024-04-11T23:59:14Z In Proceedings ACT 2024, arXiv:2509.18357 EPTCS 429, 2025, pp. 234-248 Bhakti Shah Willam Spencer Laura Zielinski Ben Caldwell Adrian Lehmann Robert Rand 10.4204/EPTCS.429.13 http://arxiv.org/abs/2509.17343v4 Quantum Simulation Programming via Typing 2025-09-25T03:20:07Z Quantum simulations are designed to model quantum systems, and many compilation frameworks have been developed for executing such simulations on quantum computers. Most compilers leverage the capabilities of digital and analog quantum computers by representing quantum particle systems with Pauli strings or digital quantum circuits, making it challenging for users in physics, chemistry, and biology to program simulations effectively. QBLUE is proposed as the first programming language for describing the behaviors of quantum systems in terms of second quantization Hamiltonians. Within QBLUE, a novel type system is proposed to clearly define states across different quantum systems and treat quantum computers as quantum particle systems of specific types. The type system is compatible with the compilation of quantum simulations expressed in QBLUE for digital and analog quantum computers. With QBLUE, users can specify the desired quantum particle system and model the system on quantum computers. 2025-09-22T04:15:18Z accepted talk paper in the 22nd International Conference on Quantum Physics and Logic (QPL 2025) Liyi Li Federico Zahariev Chandeepa Dissanayake Jae Swanepoel Amr Sabry Mark S. Gordon http://arxiv.org/abs/2509.20518v1 Enhancing Python Programming Education with an AI-Powered Code Helper: Design, Implementation, and Impact 2025-09-24T19:43:46Z This is the study that presents an AI-Python-based chatbot that helps students to learn programming by demonstrating solutions to such problems as debugging errors, solving syntax problems or converting abstract theoretical concepts to practical implementations. Traditional coding tools like Integrated Development Environments (IDEs) and static analyzers do not give robotic help while AI-driven code assistants such as GitHub Copilot focus on getting things done. To close this gap, our chatbot combines static code analysis, dynamic execution tracing, and large language models (LLMs) to provide the students with relevant and practical advice, hence promoting the learning process. The chatbots hybrid architecture employs CodeLlama for code embedding, GPT-4 for natural language interactions, and Docker-based sandboxing for secure execution. Evaluated through a mixed-methods approach involving 1,500 student submissions, the system demonstrated an 85% error resolution success rate, outperforming standalone tools like pylint (62%) and GPT-4 (73%). Quantitative results revealed a 59.3% reduction in debugging time among users, with pre- and post-test assessments showing a 34% improvement in coding proficiency, particularly in recursion and exception handling. Qualitative feedback from 120 students highlighted the chatbots clarity, accessibility, and confidence-building impact, though critiques included occasional latency and restrictive code sanitization. By balancing technical innovation with pedagogical empathy, this research provides a blueprint for AI tools that prioritize educational equity and long-term skill retention over mere code completion. The chatbot exemplifies how AI can augment human instruction, fostering deeper conceptual understanding in programming education. 2025-09-24T19:43:46Z 20 pages, 16 figures Sayed Mahbub Hasan Amiri Md Mainul Islam http://arxiv.org/abs/2509.20426v1 Dual-Language General-Purpose Self-Hosted Visual Language and new Textual Programming Language for Applications 2025-09-24T17:43:55Z Most visual programming languages (VPLs) are domain-specific, with few general-purpose VPLs like Programming Without Coding Technology (PWCT). These general-purpose VPLs are developed using textual programming languages and improving them requires textual programming. In this thesis, we designed and developed PWCT2, a dual-language (Arabic/English), general-purpose, self-hosting visual programming language. Before doing so, we specifically designed a textual programming language called Ring for its development. Ring is a dynamically typed language with a lightweight implementation, offering syntax customization features. It permits the creation of domain-specific languages through new features that extend object-oriented programming, allowing for specialized languages resembling Cascading Style Sheets (CSS) or Supernova language. The Ring Compiler and Virtual Machine are designed using the PWCT visual programming language where the visual implementation is composed of 18,945 components that generate 24,743 lines of C code, which increases the abstraction level and hides unnecessary details. Using PWCT to develop Ring allowed us to realize several issues in PWCT, which led to the development of the PWCT2 visual programming language using the Ring textual programming language. PWCT2 provides approximately 36 times faster code generation and requires 20 times less storage for visual source files. It also allows for the conversion of Ring code into visual code, enabling the creation of a self-hosting VPL that can be developed using itself. PWCT2 consists of approximately 92,000 lines of Ring code and comes with 394 visual components. PWCT2 is distributed to many users through the Steam platform and has received positive feedback, On Steam, 1772 users have launched the software, and the total recorded usage time exceeds 17,000 hours, encouraging further research and development. 2025-09-24T17:43:55Z PhD thesis Mahmoud Samir Fayed