https://arxiv.org/api/usKbvgMuhgLyMNj8eIqN3ynTdyg2026-06-30T04:44:23Z9958177015http://arxiv.org/abs/2503.04001v1Binomial Tabulation: A Short Story2025-03-06T01:26:58ZWe reconstruct some of the development in Richard Bird's [2008] paper Zippy Tabulations of Recursive Functions, using dependent types and string diagrams rather than mere simple types. This paper serves as an intuitive introduction to and demonstration of these concepts for the curious functional programmer, who ideally already has some exposure to dependent types and category theory, is not put off by basic concepts like indexed types and functors, and wants to see a more practical example.
The paper is presented in the form of a short story, narrated from the perspective of a functional programmer trying to follow the development in Bird's paper. The first section recaps the original simply typed presentation. The second section explores a series of refinements that can be made using dependent types. The third section uses string diagrams to simplify arguments involving functors and naturality. The short story ends there, but the paper concludes with a discussion and reflection in the afterword.2025-03-06T01:26:58ZHsiang-Shang KoShin-Cheng MuJeremy Gibbonshttp://arxiv.org/abs/2503.03698v1AEGIS: Towards Formalized and Practical Memory-Safe Execution of C programs via MSWASM2025-03-05T17:50:43ZPrograms written in unsafe languages such as C are prone to memory safety errors, which can lead to program compromises and serious real-world security consequences. Recently, Memory-Safe WebAssembly (MSWASM) is introduced as a general-purpose intermediate bytecode with built-in memory safety semantics. Programs written in C can be compiled into MSWASM to get complete memory safety protection. In this paper, we present our extensions on MSWASM, which improve its semantics and practicality. First, we formalize MSWASM semantics in Coq/Iris, extending it with inter-module interaction, showing that MSWASM provides fine-grained isolation guarantees analogous to WASM's coarse-grained isolation via linear memory. Second, we present Aegis, a system to adopt the memory safety of MSWASM for C programs in an interoperable way. Aegis pipeline generates Checked C source code from MSWASM modules to enforce spatial memory safety. Checked C is a recent binary-compatible extension of C which can provide guaranteed spatial safety. Our design allows Aegis to protect C programs that depend on legacy C libraries with no extra dependency and with low overhead. Aegis pipeline incurs 67% runtime overhead and near-zero memory overhead on PolyBenchC programs compared to native.2025-03-05T17:50:43ZShahram EsmaeilsabzaliArayi KhalatyanZhijun MoSruthi VenkatanarayananShengjie Xuhttp://arxiv.org/abs/2405.10089v2Do You Even Lift? Strengthening Compiler Security Guarantees Against Spectre Attacks2025-03-05T17:20:50ZMainstream compilers implement different countermeasures to prevent specific classes of speculative execution attacks. Unfortunately, these countermeasures either lack formal guarantees or come with proofs restricted to speculative semantics capturing only a subset of the speculation mechanisms supported by modern CPUs, thereby limiting their practical applicability. Ideally, these security proofs should target a speculative semantics capturing the effects of all speculation mechanisms implemented in modern CPUs. However, this is impractical and requires new secure compilation proofs to support additional speculation mechanisms. In this paper, we address this problem by proposing a novel secure compilation framework that allows lifting the security guarantees provided by Spectre countermeasures from weaker speculative semantics (ignoring some speculation mechanisms) to stronger ones (accounting for the omitted mechanisms) without requiring new secure compilation proofs. Using our lifting framework, we performed the most comprehensive security analysis of Spectre countermeasures implemented in mainstream compilers to date. Our analysis spans 9 different countermeasures against 5 classes of Spectre attacks, which we proved secure against a speculative semantics accounting for five different speculation mechanisms.2024-05-16T13:34:00ZXaver FabianMarco PatrignaniMarco GuarnieriMichael Backeshttp://arxiv.org/abs/2503.03359v1Iterating Pointers: Enabling Static Analysis for Loop-based Pointers2025-03-05T10:39:23ZPointers are an integral part of C and other programming languages. They enable substantial flexibility from the programmer's standpoint, allowing the user fine, unmediated control over data access patterns. However, accesses done through pointers are often hard to track, and challenging to understand for optimizers, compilers, and sometimes, even for the developers themselves because of the direct memory access they provide. We alleviate this problem by exposing additional information to analyzers and compilers. By separating the concept of a pointer into a data container and an offset, we can optimize C programs beyond what other state-of-the-art approaches are capable of, in some cases even enabling auto-parallelization. Using this process, we are able to successfully analyze and optimize code from OpenSSL, the Mantevo benchmark suite, and the Lempel-Ziv-Oberhumer compression algorithm. We provide the only automatic approach able to find all parallelization opportunities in the HPCCG benchmark from the Mantevo suite the developers identified and even outperform the reference implementation by up to 18%, as well as speed up the PBKDF2 algorithm implementation from OpenSSL by up to 11x.2025-03-05T10:39:23ZAndrea LeporiETH ZurichAlexandru CalotoiuETH ZurichTorsten HoeflerETH Zurich10.1145/3701993http://arxiv.org/abs/2502.04157v2Censor Resistant Instruction Independent Obfuscation for Multiple Programs2025-03-05T07:57:33ZThis work builds upon and optimizes our prior research on obfuscation as instruction decorrelation which achieves multiple program obfuscation. Leveraging this infrastructure, we further achieve the property of sensor-resistant computation.2025-02-06T15:40:58ZAli Ajorianhttp://arxiv.org/abs/2503.03153v1Substructural Parametricity2025-03-05T03:52:59ZOrdered, linear, and other substructural type systems allow us to expose deep properties of programs at the syntactic level of types. In this paper, we develop a family of unary logical relations that allow us to prove consequences of parametricity for a range of substructural type systems. A key idea is to parameterize the relation by an algebra, which we exemplify with a monoid and commutative monoid to interpret ordered and linear type systems, respectively. We prove the fundamental theorem of logical relations and apply it to deduce extensional properties of inhabitants of certain types. Examples include demonstrating that the ordered types for list append and reversal are inhabited by exactly one function, as are types of some tree traversals. Similarly, the linear type of the identity function on lists is inhabited only by permutations of the input. Our most advanced example shows that the ordered type of the list fold function is inhabited only by the fold function.2025-03-05T03:52:59ZC. B. AberléChris MartensFrank Pfenninghttp://arxiv.org/abs/2411.11856v3Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback2025-03-04T21:25:03ZTraditionally, digital hardware designs are written in the Verilog hardware description language (HDL) and debugged manually by engineers. This can be time-consuming and error-prone for complex designs. Large Language Models (LLMs) are emerging as a potential tool to help generate fully functioning HDL code, but most works have focused on generation in the single-shot capacity: i.e., run and evaluate, a process that does not leverage debugging and, as such, does not adequately reflect a realistic development process. In this work, we evaluate the ability of LLMs to leverage feedback from electronic design automation (EDA) tools to fix mistakes in their own generated Verilog. To accomplish this, we present an open-source, highly customizable framework, AutoChip, which combines conversational LLMs with the output from Verilog compilers and simulations to iteratively generate and repair Verilog. To determine the success of these LLMs we leverage the VerilogEval benchmark set. We evaluate four state-of-the-art conversational LLMs, focusing on readily accessible commercial models. EDA tool feedback proved to be consistently more effective than zero-shot prompting only with GPT-4o, the most computationally complex model we evaluated. In the best case, we observed a 5.8% increase in the number of successful designs with a 34.2% decrease in cost over the best zero-shot results. Mixing smaller models with this larger model at the end of the feedback iterations resulted in equally as much success as with GPT-4o using feedback, but incurred 41.9% lower cost (corresponding to an overall decrease in cost over zero-shot by 89.6%).2024-11-01T17:33:28ZAccepted for publication in TODAES Special Issue on Large Language Models for Electronic System Design AutomationJason BlockloveShailja ThakurBenjamin TanHammond PearceSiddharth GargRamesh Karrihttp://arxiv.org/abs/2502.15031v2Notions of Stack-manipulating Computation and Relative Monads (Extended Version)2025-03-04T16:51:32ZMonads provide a simple and concise interface to user-defined computational effects in functional programming languages. This enables equational reasoning about effects, abstraction over monadic interfaces and the development of monad transformer stacks to combine different effects. Compiler implementors and assembly code programmers similarly virtualize effects, and would benefit from similar abstractions if possible. However, the implementation details of effects seem disconnected from the high-level monad interface: at this lower level much of the design is in the layout of the runtime stack, which is not accessible in a high-level programming language.
We demonstrate that the monadic interface can be faithfully adapted from high-level functional programming to a lower level setting with explicit stack manipulation. We use a polymorphic call-by-push-value (CBPV) calculus as a setting that captures the essence of stack-manipulation, with a type system that allows programs to define domain-specific stack structures. Within this setting, we show that the existing category-theoretic notion of a relative monad can be used to model the stack-based implementation of computational effects. To demonstrate generality, we adapt a variety of standard monads to relative monads. Additionally, we show that stack-manipulating programs can benefit from a generalization of do-notation we call "monadic blocks" that allow all CBPV code to be reinterpreted to work with an arbitrary relative monad. As an application, we show that all relative monads extend automatically to relative monad transformers, a process which is not automatic for monads in pure languages.2025-02-20T20:38:40ZYuchen JiangRunze XueMax S. New10.1145/3720434http://arxiv.org/abs/2503.02557v1Mimosa: A Language for Asynchronous Implementation of Embedded Systems Software2025-03-04T12:32:57ZThis paper introduces the Mimosa language, a programming language for the design and implementation of asynchronous reactive systems, describing them as a collection of time-triggered processes which communicate through FIFO buffers. Syntactically, Mimosa builds upon the Lustre data-flow language, augmenting it with a new semantics to allow for the expression of side-effectful computations, and extending it with an asynchronous coordination layer which orchestrates the communication between processes. A formal semantics is given to both the process and coordination layer through a textual and graphical rewriting calculus, respectively, and a prototype interpreter for simulation is provided.2025-03-04T12:32:57ZNikolaus HuberSusanne GrafPhilipp RümmerWang Yi10.1007/978-3-031-95589-1_5http://arxiv.org/abs/2409.14591v3Non-Cartesian Guarded Recursion with Daggers2025-03-04T10:22:03ZGuarded recursion is a framework allowing for a formalisation of streams in classical programming languages. The latter take their semantics in cartesian closed categories. However, some programming paradigms do not take their semantics in a cartesian setting; this is the case for concurrency, reversible and quantum programming for example. In this paper, we focus on reversible programming through its categorical model in dagger categories, which are categories that contain an involutive operator on morphisms. After presenting classical guarded recursion, we show how to introduce this framework into dagger categories with sufficient structure for data types, also called dagger rig categories. First, given an arbitrary category, we build another category shown to be suitable for guarded recursion in multiple ways, via enrichment and fixed point theorems. We then study the interaction between this construction and the structure of dagger rig categories, aiming for reversible programming. Finally, we show that our construction is suitable as a model of higher-order reversible programming languages, such as symmetric pattern-matching.2024-09-22T20:55:30ZLouis Lemonnierhttp://arxiv.org/abs/2503.04811v1Introducing Support for Move Operations in Melda CRDT2025-03-04T09:47:09ZIn this paper, we present an extension to Melda (a library which implements a general purpose delta state JSON CRDT) to support move operations. This enhancement relies on minimal changes to the underlying logic of the data structure, has virtually no runtime overhead and zero storage overhead compared to the original version of the library, ensuring simplicity while addressing multiple use cases. Although concurrent reordering of the elements in a list was already supported in the original version of the library, moving objects between different containers lead to undesired outcomes, namely duplicate entries. To address this problem we revisited the original approach and introduced the necessary changes to support for relocating elements within a JSON structure. We detail those changes and provide some examples.2025-03-04T09:47:09ZAmos Broccohttp://arxiv.org/abs/2207.03418v4Parallel Dual-Numbers Reverse AD2025-03-03T14:33:33ZWhere dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.2022-07-07T16:43:14ZThis is the journal version for JFP. For the shorter conference version (POPL'23), see arXiv:2207.03418v2. For an earlier preprint, see arXiv:2205.11368Tom SmedingMatthijs Vákárhttp://arxiv.org/abs/2404.00338v2Polymorphic Records for Dynamic Languages2025-03-03T12:05:48ZWe define and study "row polymorphism" for a type system with set-theoretic types, specifically union, intersection, and negation types. We consider record types that embed row variables and define a subtyping relation by interpreting types into sets of record values and by defining subtyping as the containment of interpretations. We define a functional calculus equipped with operations for field extension, selection, and deletion, its operational semantics, and a type system that we prove to be sound. We provide algorithms for deciding the typing and subtyping relations.
This research is motivated by the current trend of defining static type system for dynamic languages and, in our case, by an ongoing effort of endowing the Elixir programming language with a gradual type system.2024-03-30T12:23:57ZProceedings of the ACM on Programming Languages, Volume 9, Issue OOPSLA1 Article No.: 132, Pages 1464 - 1491, 2025Giuseppe CastagnaLoïc Peyrot10.1145/3720497http://arxiv.org/abs/2503.01390v1Scalable and Accurate Application-Level Crash-Consistency Testing via Representative Testing2025-03-03T10:41:57ZCrash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications. The crash-state space grows exponentially as the number of operations in the program increases, necessitating techniques for pruning the search space. However, state-of-the-art crash-state space pruning is far from ideal. Some techniques look for known buggy patterns or bound the exploration for efficiency, but they sacrifice coverage and may miss bugs lodged deep within applications. Other techniques eliminate redundancy in the search space by skipping identical crash states, but they still fail to scale to larger applications.
In this work, we propose representative testing: a new crash-state space reduction strategy that achieves high scalability and high coverage. Our key observation is that the consistency of crash states is often correlated, even if those crash states are not identical. We build Pathfinder, a crash-consistency testing tool that implements an update behaviors-based heuristic to approximate a small set of representative crash states.
We evaluate Pathfinder on POSIX-based and MMIO-based applications, where it finds 18 (7 new) bugs across 8 production-ready systems. Pathfinder scales more effectively to large applications than prior works and finds 4x more bugs in POSIX-based applications and 8x more bugs in MMIO-based applications compared to state-of-the-art systems.2025-03-03T10:41:57ZOOPSLA 2025Yile GuIan NealJiexiao XuShaun Christopher LeeAyman SaidMusa HaydarJacob Van GeffenRohan KadekodiAndrew QuinnBaris Kasikci10.1145/3720431http://arxiv.org/abs/2503.00822v1Scalable Memory Recycling for Large Quantum Programs2025-03-02T09:56:39ZAs quantum computing technology advances, the complexity of quantum algorithms increases, necessitating a shift from low-level circuit descriptions to high-level programming paradigms. This paper addresses the challenges of developing a compilation algorithm that optimizes memory management and scales well for bigger, more complex circuits. Our approach models the high-level quantum code as a control flow graph and presents a workflow that searches for a topological sort that maximizes opportunities for qubit reuse. Various heuristics for qubit reuse strategies handle the trade-off between circuit width and depth. We also explore scalability issues in large circuits, suggesting methods to mitigate compilation bottlenecks. By analyzing the structure of the circuit, we are able to identify sub-problems that can be solved separately, without a significant effect on circuit quality, while reducing runtime significantly. This method lays the groundwork for future advancements in quantum programming and compiler optimization by incorporating scalability into quantum memory management.2025-03-02T09:56:39ZIsrael ReichentalRavid AlonLior PremingerMatan VaxAmir Naveh