https://arxiv.org/api/9MEAyaBshp8V9/Lrc5uLycJ67pE2026-06-26T21:08:51Z9951138015http://arxiv.org/abs/2507.18792v1Decompiling Rust: An Empirical Study of Compiler Optimizations and Reverse Engineering Challenges2025-07-24T20:26:42ZDecompiling Rust binaries is challenging due to the language's rich type system, aggressive compiler optimizations, and widespread use of high-level abstractions. In this work, we conduct a benchmark-driven evaluation of decompilation quality across core Rust features and compiler build modes. Our automated scoring framework shows that generic types, trait methods, and error handling constructs significantly reduce decompilation quality, especially in release builds. Through representative case studies, we analyze how specific language constructs affect control flow, variable naming, and type information recovery. Our findings provide actionable insights for tool developers and highlight the need for Rust-aware decompilation strategies.2025-07-24T20:26:42ZZixu Zhouhttp://arxiv.org/abs/2507.18755v1Agentic Program Repair from Test Failures at Scale: A Neuro-symbolic approach with static analysis and test execution feedback2025-07-24T19:12:32ZAim: With the advent of LLMs, sophisticated agentic program repair has become viable at large organizations with large codebases. In this work, we develop an Engineering Agent that fixes the source code based on test failures at scale across diverse software offerings internally.
Method: Using Llama as the base, we employ the ReAct harness to develop an agent. We start with a test failure that was triaged by a rule-based test failure bot. We then set up an agentic harness and allow the agent to reason and run a set of 15 actions from reading a file to generating a patch. We provide feedback to the agent through static analysis and test failures so it can refine its solution. We leverage an LLM-as-a-Judge to ensure that the patch conforms to the standards followed by a human review to land fixes.
Benchmark Findings: We curated offline benchmarks for our patch generator, the Engineering Agent loop, and the LLM-as-a-Judge. In offline evaluations we found that a specialized 70B model is highly competitive with the much larger but vanilla Llama-405B. In an ablation study, we found that the ReAct harness (neural model) benefited from the symbolic information from static analysis tools and test execution traces. A model that strikes a balance between the solve rate and error rate vs the cost and latency has a benchmark solve rate of 42.3% using an average 11.8 feedback iterations.
Production Findings: In a three month period, 80% of the generated fixes were reviewed, of which 31.5% were landed (25.5% of the total number of generated fixes).
Feedback from Engineers: We used open coding to extract qualitative themes from engineers' feedback. We saw positive feedback in the form of quick approvals, gratitude, and surprise. We also found mixed feedback when the Engineering Agent's solution was partially correct and it served as a good starting point.2025-07-24T19:12:32ZChandra MaddilaAdam TaitClaire ChangDaniel ChengNauman AhmadVijayaraghavan MuraliMarshall RochArnaud AvondetAaron MeltzerVictor MontalvaoMichael HopkoChris WatersonParth ThakkarRenuka FernandezKristian KristensenSivan BarzilySherry ChenRui AbreuNachiappan NagappanPayam ShodjaiKillian MurphyJames EveringhamAparna RamaniPeter C. Rigbyhttp://arxiv.org/abs/2507.18268v1Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++2025-07-24T10:12:00ZThe modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading manufacturers such as NVIDIA, Intel, and AMD are constantly advancing these architectures, augmenting them with features such as mixed precision, enhanced memory hierarchies, and specialised accelerator silicon blocks (e.g., Tensor Cores on GPU or AMX/SME engines on CPU) to enhance compute performance. At the same time, significant efforts in software development are aimed at optimizing the use of these innovations, seeking to improve usability and accessibility. This work contributes to the state-of-the-art of OpenFOAM development by presenting a working Proof-Of-Concept application built using modern ISO C++ parallel constructs. This approach, combined with an appropriate compiler runtime stack, like the one provided by the NVIDIA HPC SDK, makes it possible to accelerate well-defined kernels, allowing multi-core execution and GPU offloading using a single codebase. The study demonstrates that it is possible to increase the performance of the OpenFOAM laplacianFoam application by offloading the computations on NVIDIA GPUs using the C++ parallel construct.2025-07-24T10:12:00ZGiulio MalenzaGiovanni StabileFilippo SpigaRobert BirkeMarco Aldinuccihttp://arxiv.org/abs/2311.04824v6Multi-Relational Algebra for Multi-Granular Data Analytics2025-07-23T21:10:32ZIn modern data analytics, analysts frequently face the challenge of searching for desirable entities by evaluating, for each entity, a collection of its feature relations to derive key analytical properties. This search is challenging because the definitions of both entities and their feature relations may span multiple, varying granularities. Existing constructs such as GROUP BY CUBE, GROUP BY GROUPING SETS, ARRAY AGGREGATE, WINDOW functions, OLAP cube, and various data explanation paradigms aim to facilitate such analyses, but all exhibit limitations in terms of composability, clear specifications, and performance. To address these challenges, we introduce Multi-Relational Algebra (MRA), which generalizes relational algebra with two core data abstractions: RelationSpace, for managing collections of relations, and SliceRelation, which structures data around entities with corresponding relation-valued features. MRA introduces a rich set of operators for transforming data between these representations, enabling complex multi-granular analysis in a modular and declarative way. An early version of MRA is in production at Google, supporting diverse data insight applications. This paper describes the motivation for MRA, its formalism, implementation, and future opportunities.2023-11-08T16:50:15ZXi WuEugene WuZichen ZhuFengan LiJeffrey F. Naughtonhttp://arxiv.org/abs/2507.17453v1Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees2025-07-23T12:20:20ZThe vulnerability of neural networks to adversarial perturbations has necessitated formal verification techniques that can rigorously certify the quality of neural networks. As the state-of-the-art, branch and bound (BaB) is a "divide-and-conquer" strategy that applies off-the-shelf verifiers to sub-problems for which they perform better. While BaB can identify the sub-problems that are necessary to be split, it explores the space of these sub-problems in a naive "first-come-first-serve" manner, thereby suffering from an issue of inefficiency to reach a verification conclusion. To bridge this gap, we introduce an order over different sub-problems produced by BaB, concerning with their different likelihoods of containing counterexamples. Based on this order, we propose a novel verification framework Oliva that explores the sub-problem space by prioritizing those sub-problems that are more likely to find counterexamples, in order to efficiently reach the conclusion of the verification. Even if no counterexample can be found in any sub-problem, it only changes the order of visiting different sub-problem and so will not lead to a performance degradation. Specifically, Oliva has two variants, including $Oliva^{GR}$, a greedy strategy that always prioritizes the sub-problems that are more likely to find counterexamples, and $Oliva^{SA}$, a balanced strategy inspired by simulated annealing that gradually shifts from exploration to exploitation to locate the globally optimal sub-problems. We experimentally evaluate the performance of Oliva on 690 verification problems spanning over 5 models with datasets MNIST and CIFAR10. Compared to the state-of-the-art approaches, we demonstrate the speedup of Oliva for up to 25X in MNIST, and up to 80X in CIFAR10.2025-07-23T12:20:20ZThis is an extended version of the ECOOP 2025 paper, with a comparison with DATE 2025 (Figure 7 of RQ1 in Section 5.2), as well as an in-depth discussion of OOPSLA 2025 in the related work (Section 6)Guanqin ZhangKota FukudaZhenya ZhangH. M. N. Dilum BandaraShiping ChenJianjun ZhaoYulei Sui10.4230/LIPIcs.ECOOP.2025.36http://arxiv.org/abs/2507.00782v2A Diagrammatic Calculus for a Functional Model of Natural Language Semantics2025-07-23T12:07:49ZIn this paper, we study a functional programming approach to natural language semantics, allowing us to increase the expressiveness of a more traditional denotation style. We will formalize a category based type and effect system to represent the semantic difference between syntactically equivalent expressions. We then construct a diagrammatic calculus to model parsing and handling of effects, providing a method to efficiently compute the denotations for sentences.2025-07-01T14:21:20Z15 pages plus one page appendix, submission to CSL 2026Matthieu Pierre Boyerhttp://arxiv.org/abs/2507.16089v2Querying Graph-Relational Data2025-07-23T01:50:49ZFor applications that store structured data in relational databases, there is an impedance mismatch between the flat representations encouraged by relational data models and the deeply nested information that applications expect to receive. In this work, we present the graph-relational database model, which provides a flexible, compositional, and strongly-typed solution to this "object-relational mismatch." We formally define the graph-relational database model and present a static and dynamic semantics for queries. In addition, we discuss the realization of the graph-relational database model in EdgeQL, a general-purpose SQL-style query language, and the Gel system, which compiles EdgeQL schemas and queries into PostgreSQL queries. Gel facilitates the kind of object-shaped data manipulation that is frequently provided inefficiently by object-relational mapping (ORM) technologies, while achieving most of the efficiency that comes from writing complex PostgreSQL queries directly.2025-07-21T22:02:21ZMichael J. SullivanZhibo ChenElvis PranskevichusRobert J. SimmonsVictor PetrovykhAljaž Mur ErženYury Selivanovhttp://arxiv.org/abs/2212.10131v3Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms2025-07-22T16:49:24ZServerless is an attractive computing model that offers seamless scalability and elasticity; it takes the infrastructure management burden away from users and enables a pay-as-you-use billing model. As a result, serverless is becoming increasingly popular to support highly elastic and bursty workloads. However, existing platforms are supported by bloated virtualization stacks, which, combined with bursty and irregular invocations, lead to high memory and latency overheads.
To reduce the virtualization stack bloat, we propose Hydra, a virtualized multi-language runtime and platform capable of hosting multiple sandboxes running concurrently. To fully leverage Hydra's virtualized runtime, we revisit the existing serverless platform design to make it colocation-aware across owners and functions, and to feature a caching layer of pre-allocated Hydra instances that can be used by different functions written in different languages to reduce cold starts. We also propose a snapshotting mechanism to checkpoint and restore individual sandboxes.
By consolidating multiple serverless function invocations through Hydra, we improve the overall function density (ops/GB-sec) by 2.41x on average compared to OpenWhisk runtimes, the state-of-the-art single-language runtimes used in most serverless platforms, and by 1.43x on average compared to Knative runtimes supporting invocation colocation within the same function. When reproducing the Azure Functions trace, our serverless platform operating Hydra instances reduces the overall memory footprint by 21.3-43.9% compared to operating OpenWhisk instances and by 14.5-30% compared to operating Knative instances. Hydra eliminates cold starts thanks to the pool of pre-warmed runtime instances, reducing p99 latency by 45.3-375.5x compared to OpenWhisk and by 1.9-51.4x compared to Knative.2022-12-20T09:58:39ZSerhii IvanenkoVasyl LankoRudi HornVojin JovanovicRodrigo Brunohttp://arxiv.org/abs/2507.16660v1Enhancing Compiler Optimization Efficiency through Grammatical Decompositions of Control-Flow Graphs2025-07-22T14:54:48ZThis thesis addresses the complexities of compiler optimizations, such as register allocation and Lifetime-optimal Speculative Partial Redundancy Elimination (LOSPRE), which are often handled using tree decomposition algorithms. However, these methods frequently overlook important sparsity aspects of Control Flow Graphs (CFGs) and result in high computational costs. We introduce the SPL (Series-Parallel-Loop) decomposition, a novel framework that offers optimal solutions to these challenges. A key contribution is the formulation of a general solution for Partial Constraint Satisfaction Problems (PCSPs) within graph structures, applied to three optimization problems. First, SPL decomposition enhances register allocation by accurately modeling variable interference graphs, leading to efficient register assignments and improved performance across benchmarks. Second, it optimizes LOSPRE by effectively identifying and eliminating redundancies in program execution. Finally, the thesis focuses on optimizing the placement of bank selection instructions to enhance data retrieval efficiency and reduce latency. Extensive experimentation demonstrates significant performance improvements over existing methods, establishing SPL decomposition as a powerful tool for complex compiler optimizations, including register allocation, LOSPRE, and bank selection.2025-07-22T14:54:48ZXuran Caihttp://arxiv.org/abs/2507.15007v2Hear Your Code Fail, Voice-Assisted Debugging for Python2025-07-22T13:49:40ZThis research introduces an innovative voice-assisted debugging plugin for Python that transforms silent runtime errors into actionable audible diagnostics. By implementing a global exception hook architecture with pyttsx3 text-to-speech conversion and Tkinter-based GUI visualization, the solution delivers multimodal error feedback through parallel auditory and visual channels. Empirical evaluation demonstrates 37% reduced cognitive load (p<0.01, n=50) compared to traditional stack-trace debugging, while enabling 78% faster error identification through vocalized exception classification and contextualization. The system achieves sub-1.2 second voice latency with under 18% CPU overhead during exception handling, vocalizing error types and consequences while displaying interactive tracebacks with documentation deep links. Criteria validate compatibility across Python 3.7+ environments on Windows, macOS, and Linux platforms. Needing only two lines of integration code, the plugin significantly boosts availability for aesthetically impaired designers and supports multitasking workflows through hands-free error medical diagnosis. Educational applications show particular promise, with pilot studies indicating 45% faster debugging skill acquisition among novice programmers. Future development will incorporate GPT-based repair suggestions and real-time multilingual translation to further advance auditory debugging paradigms. The solution represents a fundamental shift toward human-centric error diagnostics, bridging critical gaps in programming accessibility while establishing new standards for cognitive efficiency in software development workflows.2025-07-20T15:24:35Z35 pages, 20 figuresSayed Mahbub Hasan AmiriMd. Mainul IslamMohammad Shakhawat HossenSayed Majhab Hasan AmiriMohammad Shawkat Ali MamunSk. Humaun KabirNaznin Akterhttp://arxiv.org/abs/2503.10955v2Bialgebraic Reasoning on Stateful Languages2025-07-21T23:21:46ZReasoning about program equivalence in imperative languages is notoriously challenging, as the presence of states (in the form of variable stores) fundamentally increases the observational power of program terms. The key desideratum for any notion of equivalence is compositionality, guaranteeing that subprograms can be safely replaced by equivalent subprograms regardless of the context. To facilitate compositionality proofs and avoid boilerplate work, one would hope to employ the abstract bialgebraic methods provided by Turi and Plotkin's powerful theory of mathematical operational semantics (a.k.a. abstract GSOS) or its recent extension by Goncharov et al. to higher-order languages. However, multiple attempts to apply abstract GSOS to stateful languages have thus failed. We propose a novel approach to the operational semantics of stateful languages based on the formal distinction between readers (terms that expect an initial input store before being executed), and writers (running terms that have already been provided with a store). In contrast to earlier work, this style of semantics is fully compatible with abstract GSOS, and we can thus leverage the existing theory to obtain coinductive reasoning techniques. We demonstrate that our approach generates non-trivial compositionality results for stateful languages with first-order and higher-order store and that it flexibly applies to program equivalences at different levels of granularity, such as trace, cost, and natural equivalence.2025-03-13T23:42:00ZSergey GoncharovStefan MiliusLutz SchröderStelios TsampasHenning Urbat10.1145/3747513http://arxiv.org/abs/2507.12367v2GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities2025-07-21T21:56:07ZThe rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. We make the dataset and evaluation code publicly available at https://github.com/mrcabbage972/GitChameleonBenchmark.2025-07-16T16:10:42ZVersion 2 of the dataset from: arXiv:2411.05830Diganta MisraNizar IslahVictor MayBrice RaubyZihan WangJustine GehringAntonio OrvietoMuawiz ChaudharyEilif B. MullerIrina RishSamira Ebrahimi KahouMassimo Cacciahttp://arxiv.org/abs/2507.16086v1Understanding Haskell-style Overloading via Open Data and Open Functions2025-07-21T21:46:30ZWe present a new, uniform semantics for Haskell-style overloading. We realize our approach in a new core language, System F$_\mathrm{D}$, whose metatheory we mechanize in the Lean4 interactive theorem prover. System F$_\mathrm{D}$ is distinguished by its open data types and open functions, each given by a collection of instances rather than by a single definition. We show that System F$_\mathrm{D}$ can encode advanced features of Haskell's of type class systems, more expressively than current semantics of these features, and without assuming additional type equality axioms.2025-07-21T21:46:30ZAndrew MarmadukeApoorv IngleJ. Garrett Morrishttp://arxiv.org/abs/2410.11742v2Abstracting Extensible Recursive Functions2025-07-21T21:40:22ZWe explore recursive programming with extensible data types. Row types make the structure of data types first class, and can express a variety of type system features including record subtyping and combination of case branches. Our goal is the modular combination of recursive types and of recursive functions over them. The most significant challenge is in recursive function calls, which may need to account for new cases in a combined type. We introduce extensible histomorphisms, Mendler-style descriptions of recursive functions in which recursive calls can happen at larger types, and show that they provide expressive recursion over extensible data types. We formalize our approach in R$ωμ$, a row type theory with support for recursive terms and types.2024-10-15T16:18:08ZAlex HubersApoorv IngleAndrew MarmadukeJ. Garrett Morrishttp://arxiv.org/abs/2502.15500v2What does it take to certify a conversion checker?2025-07-21T18:27:38ZWe report on a detailed exploration of the properties of conversion (definitional equality) in dependent type theory, with the goal of certifying decision procedures for it. While in that context the property of normalisation has attracted the most light, we instead emphasize the importance of injectivity properties, showing that they alone are both crucial and sufficient to certify most desirable properties of conversion checkers. We also explore the certification of a fully untyped conversion checker, with respect to a typed specification, and show that the story is mostly unchanged, although the exact injectivity properties needed are subtly different.2025-02-21T14:49:51ZIn 10th International Conference on Formal Structures for Computation and Deduction (FSCD 2025)Meven Lennon-Bertrand10.4230/LIPIcs.FSCD.2025.27