https://arxiv.org/api/69PWsWrmyTVqZIzdv2khzLbThDA 2026-06-28T13:11:49Z 9951 1575 15 http://arxiv.org/abs/2505.21514v1 SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation 2025-05-21T04:59:44Z

We introduce SIMCOPILOT, a benchmark that simulates the role of large language models (LLMs) as interactive, "copilot"-style coding assistants. Targeting both completion (finishing incomplete methods or code blocks) and infill tasks (filling missing segments within existing code), SIMCOPILOT provides a comprehensive framework for evaluating LLM coding capabilities. The benchmark comprises dedicated sub-benchmarks for Java (SIMCOPILOTJ) and Python (SIMCOPILOTP), covering diverse codebases varying in size and complexity. Our key contributions include: (a) establishing a realistic, detailed evaluation environment to assess LLM utility in practical coding scenarios, and (b) providing fine-grained analyses that address critical factors frequently overlooked by existing benchmarks, such as task-specific performance nuances, contextual understanding across code segments, and sensitivity to variable scope. Evaluations conducted across domains-including algorithms, databases, computer vision, and neural networks-offer insights into model strengths and highlight persistent challenges in maintaining logical consistency within complex dependency structures. Beyond benchmarking, our study sheds light on the current limitations of LLM-driven code generation and underscores the ongoing transition of LLMs from merely syntax-aware generators toward reliable, intelligent software development partners.

2025-05-21T04:59:44Z Keywords: Benchmark Dataset, LLM Evaluation, Gen-AI, Program Synthesis; TLDR: SimCoPilot is a benchmark for evaluating LLMs as "copilot"-style interactive coding assistants, testing their ability to integrate and complete code within complex real-world software environments Mingchao Jiang Abhinav Jain Sophia Zorek Chris Jermaine http://arxiv.org/abs/2505.14508v1 Design and Evaluation of a Microservices Cloud Framework for Online Travel Platforms 2025-05-20T15:36:55Z

Handling online travel agents globally requires efficient and flexible software solution architectures. When it needs to handle thousands of agents and billions of clients data globally. Microservices architecture is used to break down a large program into numerous, smaller services which can run individually and perform individual tasks. This paper analyses and integrates a unique Microservices Cloud Framework designed to support Online Travel Platforms (MCF-OTP). MCF-OTPs main goal is to increase the performance, flexibility, and maintenance of online travel platforms via cloud computing and microservice technologies. Large-scale travel apps, including managing numerous data sources, dealing with traffic peaks, and providing fault tolerance, can be addressed by the suggested framework. The framework increases good interpretation between flawless data synchronization, microservices, and dynamic scaling based on demand technology. An organization framework that optimizes service borders and minimizes inter-service dependencies is recommended. Thus, this can result in elevated development adaptability. In this research, the principal goal is to evaluate MCF-OTPs efficiency using the indicators of fault tolerance and response time. It is indicated by the findings that the MCF-OTP structure excels traditional monolithic designs in terms of dependability and scalability, managing traffic spikes seamlessly and decreasing downtime. The cost-effective analysis helps ascertain the net gain attained by the startup fees and the ongoing operational costs. The cloud-based environment is used to reduce the fracture cost which also helps to increase the efficiency of resource allocation, according to the research.

2025-05-20T15:36:55Z 15 pages, 2 figures, 6 tables Biman Barua M. Shamim Kaiser http://arxiv.org/abs/2505.14213v1 Augmented Weak Distance for Fast and Accurate Bounds Checking 2025-05-20T11:17:34Z

This work advances floating-point program verification by introducing Augmented Weak-Distance (AWD), a principled extension of the Weak-Distance (WD) framework. WD is a recent approach that reformulates program analysis as a numerical minimization problem, providing correctness guarantees through non-negativity and zero-target correspondence. It consistently outperforms traditional floating-point analysis, often achieving speedups of several orders of magnitude. However, WD suffers from ill-conditioned optimization landscapes and branching discontinuities, which significantly hinder its practical effectiveness. AWD overcomes these limitations with two key contributions. First, it enforces the Monotonic Convergence Condition (MCC), ensuring a strictly decreasing objective function and mitigating misleading optimization stalls. Second, it extends WD with a per-path analysis scheme, preserving the correctness guarantees of weak-distance theory while integrating execution paths into the optimization process. These enhancements construct a well-conditioned optimization landscape, enabling AWD to handle floating-point programs effectively, even in the presence of loops and external functions. We evaluate AWD on SV-COMP 2024, a widely used benchmark for software verification.Across 40 benchmarks initially selected for bounds checking, AWD achieves 100% accuracy, matching the state-of-the-art bounded model checker CBMC, one of the most widely used verification tools, while running 170X faster on average. In contrast, the static analysis tool Astrée, despite being fast, solves only 17.5% of the benchmarks. These results establish AWD as a highly efficient alternative to CBMC for bounds checking, delivering precise floating-point verification without compromising correctness.

2025-05-20T11:17:34Z Zhoulai Fu Freek Verbeek Binoy Ravindran http://arxiv.org/abs/2505.14092v1 Verifying Tree-Manipulating Programs via CHCs 2025-05-20T08:56:33Z

Programs that manipulate tree-shaped data structures often require complex, specialized proofs that are difficult to generalize and automate. This paper introduces a unified, foundational approach to verifying such programs. Central to our approach is the knitted-tree encoding, modeling each program execution as a tree structure capturing input, output, and intermediate states. Leveraging the compositional nature of knitted-trees, we encode these structures as constrained Horn clauses (CHCs), reducing verification to CHC satisfiability task. To illustrate our approach, we focus on memory safety and show how it naturally leads to simple, modular invariants.

2025-05-20T08:56:33Z Marco Faella Gennaro Parlato http://arxiv.org/abs/2505.13683v1 Genesis: A Compiler Framework for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers 2025-05-19T19:32:06Z

This paper introduces Genesis, the first compiler designed to support Hamiltonian Simulation on hybrid continuous-variable (CV) and discrete-variable (DV) quantum computing systems. Genesis is a two-level compilation system. At the first level, it decomposes an input Hamiltonian into basis gates using the native instruction set of the target hybrid CV-DV quantum computer. At the second level, it tackles the mapping and routing of qumodes/qubits to implement long-range interactions for the gates decomposed from the first level. Rather than a typical implementation that relies on SWAP primitives similar to qubit-based (or DV-only) systems, we propose an integrated design of connectivity-aware gate synthesis and beamsplitter SWAP insertion tailored for hybrid CV-DV systems. We also introduce an OpenQASM-like domain-specific language (DSL) named CVDV-QASM to represent Hamiltonian in terms of Pauli-exponentials and basic gate sequences from the hybrid CV-DV gate set. Genesis has successfully compiled several important Hamiltonians, including the Bose-Hubbard model, $\mathbb{Z}_2-$Higgs model, Hubbard-Holstein model, Heisenberg model and Electron-vibration coupling Hamiltonians, which are critical in domains like quantum field theory, condensed matter physics, and quantum chemistry. Our implementation is available at Genesis-CVDV-Compiler(https://github.com/ruadapt/Genesis-CVDV-Compiler).

2025-05-19T19:32:06Z To appear in ISCA 2025 Zihan Chen Jiakang Li Minghao Guo Henry Chen Zirui Li Joel Bierman Yipeng Huang Huiyang Zhou Yuan Liu Eddy Z. Zhang http://arxiv.org/abs/2502.07966v2 Härpfer's Extended Indispensability Algorithm in Z 2025-05-18T10:46:31Z

Since 1978, Clarence Barlow developed the ``Indispensability Function''. It operates on a metric tree that is bound to the same prime number of branches for all subtrees of each particular level. It assigns to all leaf postions of this tree a numeric value which indicates how important the acoustic presence of an event at this position is for the meter to be recognized as such. Bernd Härpfer extended this concept in 2015 to deal with meters which have arbitrary groupings into two or three at any position of the tree hierarchy. This is called ``Extended Indispensability Algorithm''. This article gives a specification of the Extended Algorithm in a slightly extended version of the Z specification language, and a possible generalization to arbitrary metric trees.

2025-02-11T21:23:35Z Corrections: (1) pg1 col2 "gmsn(-1)", not "GNSM(-1)" (2) table 3, operator precedence round "cyclicSucc" Markus Lepper Bernd Härpfer Baltasar Trancón y Widemann http://arxiv.org/abs/2505.12210v1 Nonmalleable Progress Leakage 2025-05-18T03:02:30Z

Information-flow control systems often enforce progress-insensitive noninterference, as it is simple to understand and enforce. Unfortunately, real programs need to declassify results and endorse inputs, which noninterference disallows, while preventing attackers from controlling leakage, including through progress channels, which progress-insensitivity ignores. This work combines ideas for progress-sensitive security with secure downgrading (declassification and endorsement) to identify a notion of securely downgrading progress information. We use hyperproperties to distill the separation between progress-sensitive and progress-insensitive noninterference and combine it with nonmalleable information flow, an existing (progress-insensitive) definition of secure downgrading, to define nonmalleable progress leakage (NMPL). We present the first information-flow type system to allow some progress leakage while enforcing NMPL, and we show how to infer the location of secure progress downgrades. All theorems are verified in Rocq.

2025-05-18T03:02:30Z Proceedings of the 2025 IEEE Computer Security Foundations Symposium (CSF) Ethan Cecchetti 10.1109/CSF64896.2025.00029 http://arxiv.org/abs/2505.11979v1 Introduction to Analytical Software Engineering Design Paradigm 2025-05-17T12:23:55Z

As modern software systems expand in scale and complexity, the challenges associated with their modeling and formulation grow increasingly intricate. Traditional approaches often fall short in effectively addressing these complexities, particularly in tasks such as design pattern detection for maintenance and assessment, as well as code refactoring for optimization and long-term sustainability. This growing inadequacy underscores the need for a paradigm shift in how such challenges are approached and resolved. This paper presents Analytical Software Engineering (ASE), a novel design paradigm aimed at balancing abstraction, tool accessibility, compatibility, and scalability. ASE enables effective modeling and resolution of complex software engineering problems. The paradigm is evaluated through two frameworks Behavioral-Structural Sequences (BSS) and Optimized Design Refactoring (ODR), both developed in accordance with ASE principles. BSS offers a compact, language-agnostic representation of codebases to facilitate precise design pattern detection. ODR unifies artifact and solution representations to optimize code refactoring via heuristic algorithms while eliminating iterative computational overhead. By providing a structured approach to software design challenges, ASE lays the groundwork for future research in encoding and analyzing complex software metrics.

2025-05-17T12:23:55Z The Conference's autorization to submit a preprint was granted Tarik Houichime Younes El Amrani http://arxiv.org/abs/2505.11849v1 VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation 2025-05-17T05:25:01Z

Automating Register Transfer Level (RTL) code generation using Large Language Models (LLMs) offers substantial promise for streamlining digital circuit design and reducing human effort. However, current LLM-based approaches face significant challenges with training data scarcity, poor specification-code alignment, lack of verification mechanisms, and balancing generalization with specialization. Inspired by DeepSeek-R1, we introduce VeriReason, a framework integrating supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning for RTL generation. Using curated training examples and a feedback-driven reward model, VeriReason combines testbench evaluations with structural heuristics while embedding self-checking capabilities for autonomous error correction. On the VerilogEval Benchmark, VeriReason delivers significant improvements: achieving 83.1% functional correctness on the VerilogEval Machine benchmark, substantially outperforming both comparable-sized models and much larger commercial systems like GPT-4 Turbo. Additionally, our approach demonstrates up to a 2.8X increase in first-attempt functional correctness compared to baseline methods and exhibits robust generalization to unseen designs. To our knowledge, VeriReason represents the first system to successfully integrate explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis. The models and datasets are available at: https://huggingface.co/collections/AI4EDA-CASE Code is Available at: https://github.com/NellyW8/VeriReason

2025-05-17T05:25:01Z 11 pages, 2 figures Yiting Wang Guoheng Sun Wanghao Ye Gang Qu Ang Li http://arxiv.org/abs/2503.02477v2 Random Variables, Conditional Independence and Categories of Abstract Sample Spaces 2025-05-16T13:39:11Z

Two high-level "pictures" of probability theory have emerged: one that takes as central the notion of random variable, and one that focuses on distributions and probability channels (Markov kernels). While the channel-based picture has been successfully axiomatized, and widely generalized, using the notion of Markov category, the categorical semantics of the random variable picture remain less clear. Simpson's probability sheaves are a recent approach, in which probabilistic concepts like random variables are allowed vary over a site of sample spaces. Simpson has identified rich structure on these sites, most notably an abstract notion of conditional independence, and given examples ranging from probability over databases to nominal sets. We aim bring this development together with the generality and abstraction of Markov categories: We show that for any suitable Markov category, a category of sample spaces can be defined which satisfies Simpson's axioms, and that a theory of probability sheaves can be developed purely synthetically in this setting. We recover Simpson's examples in a uniform fashion from well-known Markov categories, and consider further generalizations.

2025-03-04T10:42:00Z Dario Stein http://arxiv.org/abs/2505.09363v2 eqsat: An Equality Saturation Dialect for Non-destructive Rewriting 2025-05-15T11:34:04Z

With recent algorithmic improvements and easy-to-use libraries, equality saturation is being picked up for hardware design, program synthesis, theorem proving, program optimization, and more. Existing work on using equality saturation for program optimization makes use of external equality saturation libraries such as egg, typically generating a single optimized expression. In the context of a compiler, such an approach uses equality saturation to replace a small number of passes. In this work, we propose an alternative approach that represents equality saturation natively in the compiler's intermediate representation, facilitating the application of constructive compiler passes that maintain the e-graph state throughout the compilation flow. We take LLVM's MLIR framework and propose a new MLIR dialect named eqsat that represents e-graphs in MLIR code. This not only provides opportunities to rethink e-matching and extraction techniques by orchestrating existing MLIR passes, such as common subexpression elimination, but also avoids translation overhead between the chosen e-graph library and MLIR. Our eqsat intermediate representation (IR) allows programmers to apply equality saturation on arbitrary domain-specific IRs using the same flow as other compiler transformations in MLIR.

2025-05-14T13:12:39Z Accepted as workshop paper at the EGRAPHS 2025 workshop Jules Merckx Alexandre Lopoukhine Samuel Coward Jianyi Cheng Bjorn De Sutter Tobias Grosser http://arxiv.org/abs/2411.14887v2 OMP4Py: a pure Python implementation of OpenMP 2025-05-15T08:52:04Z

Python demonstrates lower performance in comparison to traditional high performance computing (HPC) languages such as C, C++, and Fortran. This performance gap is largely due to Python's interpreted nature and the Global Interpreter Lock (GIL), which hampers multithreading efficiency. However, the latest version of Python includes the necessary changes to make the interpreter thread-safe, allowing Python code to run without the GIL. This important update will enable users to fully exploit multithreading parallelism in Python. In order to facilitate that task, this paper introduces OMP4Py, the first pure Python implementation of OpenMP. We demonstrate that it is possible to bring OpenMP's familiar directive-based parallelization paradigm to Python, allowing developers to write parallel code with the same level of control and flexibility as in C, C++, or Fortran. The experimental evaluation shows that OMP4Py significantly impacts the performance of various types of applications, although the current threading limitation of Python's interpreter (v3.13) reduce its effectiveness for numerical applications.

2024-11-22T12:11:32Z 15 pages, 14 figures César Piñeiro Juan C. Pichel http://arxiv.org/abs/2501.17707v3 vNV-Heap: An Ownership-Based Virtually Non-Volatile Heap for Embedded Systems 2025-05-15T08:01:39Z

The Internet of Batteryless Things might revolutionize our understanding of connected devices by harvesting required operational energy from the environment. These systems come with the system-software challenge that the intermittently powered IoT devices have to checkpoint their state in non-volatile memory to later resume with this state when sufficient energy is available. The scarce energy resources demand that only modified data is persisted before a power failure, which requires precise modification tracking. We present vNV-Heap, the first ownership-based virtually Non-Volatile Heap for intermittently powered systems with guaranteed power-failure resilience. The heap exploits ownership systems, a zero-cost (i.e., compile-time) abstraction for example implemented by Rust, to track modifications and virtualize object persistence. To achieve power-failure resilience, our heap is designed and implemented to guarantee bounded operations by static program code analysis: For example, the heap allows for determining a worst-case energy consumption for the operation of persisting modified and currently volatile objects. The evaluation of our open-source implementation on an embedded hardware platform (i.e., ESP32-C3) shows that using our heap abstraction is more energy efficient than existing approaches while also providing runtime guarantees by static worst-case bounds.

2025-01-29T15:22:19Z Markus Elias Gerber Luis Gerhorst Ishwar Mudraje Kai Vogelgesang Thorsten Herfet Peter Wägemann 10.1145/3735452.3735534 http://arxiv.org/abs/2505.09021v1 AI-Mediated Code Comment Improvement 2025-05-13T23:31:32Z

This paper describes an approach to improve code comments along different quality axes by rewriting those comments with customized Artificial Intelligence (AI)-based tools. We conduct an empirical study followed by grounded theory qualitative analysis to determine the quality axes to improve. Then we propose a procedure using a Large Language Model (LLM) to rewrite existing code comments along the quality axes. We implement our procedure using GPT-4o, then distil the results into a smaller model capable of being run in-house, so users can maintain data custody. We evaluate both our approach using GPT-4o and the distilled model versions. We show in an evaluation how our procedure improves code comments along the quality axes. We release all data and source code in an online repository for reproducibility.

2025-05-13T23:31:32Z Maria Dhakal Chia-Yi Su Robert Wallace Chris Fakhimi Aakash Bansal Toby Li Yu Huang Collin McMillan http://arxiv.org/abs/2505.08906v1 Comparing Parallel Functional Array Languages: Programming and Performance 2025-05-13T18:54:36Z

Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and implementations of five different functional array languages: Accelerate, APL, DaCe, Futhark, and SaC. We demonstrate the expressiveness of functional array programming by means of four challenging benchmarks, namely N-body simulation, MultiGrid, Quickhull, and Flash Attention. These benchmarks represent a range of application domains and parallel computational models. We argue that the functional array code is much shorter and more comprehensible than the hand-optimized baseline implementations because it omits architecture-specific aspects. Instead, the language implementations generate both multicore and GPU executables from a single source code base. Hence, we further argue that functional array code could more easily be ported to, and optimized for, new parallel architectures than conventional implementations of numerical kernels. We demonstrate this potential by reporting the performance of the five parallel functional array languages on a total of 39 instances of the four benchmarks on both a 32-core AMD EPYC 7313 multicore system and on an NVIDIA A30 GPU. We explore in-depth why each language performs well or not so well on each benchmark and architecture. We argue that the results demonstrate that mature functional array languages have the potential to deliver performance competitive with the best available conventional techniques.

2025-05-13T18:54:36Z David van Balen Tiziano De Matteis Clemens Grelck Troels Henriksen Aaron W. Hsu Gabriele K. Keller Thomas Koopman Trevor L. McDonell Cosmin Oancea Sven-Bodo Scholz Artjoms Sinkarovs Tom Smeding Phil Trinder Ivo Gabe de Wolff Alexandros Nikolaos Ziogas