https://arxiv.org/api/9XDbjqsVSzTZrV2UoO4v4fJhoKc2026-06-30T16:34:26Z9958195015http://arxiv.org/abs/2412.14374v1Scaling Deep Learning Training with MPMD Pipeline Parallelism2024-12-18T22:15:11ZWe present JaxPP, a system for efficiently scaling the training of large deep learning models with flexible pipeline parallelism. We introduce a seamless programming model that allows implementing user-defined pipeline schedules for gradient accumulation. JaxPP automatically distributes tasks, corresponding to pipeline stages, over a cluster of nodes and automatically infers the communication among them. We implement a MPMD runtime for asynchronous execution of SPMD tasks. The pipeline parallelism implementation of JaxPP improves hardware utilization by up to $1.11\times$ with respect to the best performing SPMD configuration.2024-12-18T22:15:11ZAnxhelo XhebrajSean LeeHanfeng ChenVinod Groverhttp://arxiv.org/abs/2412.13996v1Implicit Rankings for Verifying Liveness Properties in First-Order Logic2024-12-18T16:14:12ZLiveness properties are traditionally proven using a ranking function that maps system states to some well-founded set. Carrying out such proofs in first-order logic enables automation by SMT solvers. However, reasoning about many natural ranking functions is beyond reach of existing solvers. To address this, we introduce the notion of implicit rankings - first-order formulas that soundly approximate the reduction of some ranking function without defining it explicitly. We provide recursive constructors of implicit rankings that can be instantiated and composed to induce a rich family of implicit rankings. Our constructors use quantifiers to approximate reasoning about useful primitives such as cardinalities of sets and unbounded sums that are not directly expressible in first-order logic. We demonstrate the effectiveness of our implicit rankings by verifying liveness properties of several intricate examples, including Dijkstra's k-state, 4-state and 3-state self-stabilizing protocols.2024-12-18T16:14:12Z34 pages, 2 figuresRaz LotanSharon Shohamhttp://arxiv.org/abs/2412.19826v1Modular probabilistic programming with algebraic effects (MSc Thesis 2019)2024-12-18T15:05:00ZProbabilistic programming languages, which exist in abundance, are languages that allow users to calculate probability distributions defined by probabilistic programs, by using inference algorithms. However, the underlying inference algorithms are not implemented in a modular fashion, though, the algorithms are presented as a composition of other inference components. This discordance between the theory and the practice of Bayesian machine learning, means that reasoning about the correctness of probabilistic programs is more difficult, and composing inference algorithms together in code may not necessarily produce correct compound inference algorithms. In this dissertation, I create a modular probabilistic programming library, already a nice property as its not a standalone language, called Koka Bayes, that is based off of both the modular design of Monad Bayes -- a probabilistic programming library developed in Haskell -- and its semantic validation. The library is embedded in a recently created programming language, Koka, that supports algebraic effect handlers and expressive effect types -- novel programming abstractions that support modular programming. Effects are generalizations of computational side-effects, and it turns out that fundamental operations in probabilistic programming such as probabilistic choice and conditioning are instances of effects.2024-12-18T15:05:00Z47 PagesOliver GoldsteinOhad Kammar10.7488/era/5485http://arxiv.org/abs/2412.13581v1Verified invertible lexer using regular expressions and DFAs2024-12-18T08:03:17ZIn this project, we explore the concept of invertibility applied to serialisation and lexing frameworks. Recall that, on one hand, serialisation is the process of taking a data structure and writing it to a bit array while parsing is the reverse operation, i.e., reading the bit array and constructing the data structure back. While lexing, on the other hand, is the process of reading a stream of characters and splitting them into tokens, by following a list of given rules. While used in different applications, both are similar in their abstract operation: they both take a list of simple characters and extract a more complex structure. Applications in which these two operations are used are different but they share a need for the invertibility of the process. For example, when tokenising a code file that was prettyprinted by a compiler, one would expect to get the same sequence of tokens. Similarly, when a spacecraft sends scientific data to the ground, one would expect the parsed data to be the same as the one serialised by the spacecraft. The idea of this project is to explore the idea of having a framework capable of generating parser/serialiser or lexer/prettyprinter pairs with a formally verified notion of invertibility. We first explore related works and frameworks. After that, we present our verified lexer framework developed in Scala and verified using the Stainless framework1. We explain the implementation choices we make and present the specifications and their proofs. The code of the lexer with the proofs is available on Github2. The main branch contains the regular expression (called regex from now on) matcher version and the verified Computable Languages while the dfa match branch contains the version using the DFA matcher.2024-12-18T08:03:17ZSamuel ChassotViktor KunĨakhttp://arxiv.org/abs/2412.13398v1Pattern Matching in AI Compilers and its Formalization (Extended Version)2024-12-18T00:29:09ZPyPM is a Python-based domain specific language (DSL) for building rewrite-based optimization passes on machine learning computation graphs. Users define individual optimizations by writing (a) patterns that match subgraphs of a computation graph and (b) corresponding rules which replace a matched subgraph with an optimized kernel. PyPM is distinguished from the many other DSLs for defining rewriting passes by its complex and novel pattern language which borrows concepts from logic programming. PyPM patterns can be recursive, nondeterminstic, and can require checking domain-specific constraints such as the shapes of tensors. The PyPM implementation is thus similarly complicated, consisting of thousands of lines of C++ code. In this paper, we present our work on building PyPM, as well as formalizing and distilling and this complexity to an understandable mathematical core. We have developed a formal core calculus expressing the main operations of the PyPM pattern language. We define both a declarative semantics - describing which patterns match which terms - and an algorithmic semantics - an idealized version of the PyPM pattern interpreter - and prove their equivalence. The development is fully mechanized in the Coq proof assistant.2024-12-18T00:29:09ZTo appear at CGO'25Joseph W. CutlerAlex CollinsBin FanMahesh RavishankarVinod Groverhttp://arxiv.org/abs/2309.11826v3Maximal Simplification of Polyhedral Reductions2024-12-17T21:09:02ZReductions combine collections of input values with an associative and often commutative operator to produce collections of results. When the same input value contributes to multiple outputs, there is an opportunity to reuse partial results, enabling reduction simplification. Simplification often produces a program with lower asymptotic complexity. Typical compiler optimizations yield, at best, a constant fold speedup, but a complexity improvement from, say, cubic to quadratic complexity yields unbounded speedup for sufficiently large problems. It is well known that reductions in polyhedral programs may be simplified automatically, but previous methods cannot exploit all available reuse. This paper resolves this long-standing open problem, thereby attaining minimal asymptotic complexity in the simplified program. We propose extensions to prior work on simplification to support any independent commutative reduction. At the heart of our approach is piece-wise simplification, the notion that we can split an arbitrary reduction into pieces and then independently simplify each piece. However, the difficulty of using such piece-wise transformations is that they typically involve an infinite number of choices. We give constructive proofs to deal with this and select a finite number of pieces for simplification.2023-09-21T06:57:46ZPOPL 2025 preprintLouis NarmourTomofumi YukiSanjay Rajopadhye10.1145/3704839http://arxiv.org/abs/2408.09237v7QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning2024-12-17T18:37:07ZFormal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 34% shorter proofs 29% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 30.3% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.2024-08-17T16:06:14ZAuthors could not agree on final revision. Please see author websites for individual versions of paperAlex Sanchez-SternAbhishek VargheseZhanna KaufmanDylan ZhangTalia RingerYuriy Brun10.1109/ICSE55347.2025.00033http://arxiv.org/abs/2412.16206v1Information Aware Type Systems and Telescopic Constraint Trees2024-12-17T16:09:28ZCan we use the flow of information to understand type systems? I present two familiar type systems in pursuit of an `Information Aware' style, using information effects to reveal data flow and help in implementing them. I also calculate a general, scoped, constraint-based representation of typechecking problems from the typing rules.2024-12-17T16:09:28ZExtended abstract from MSFP2020Philippa Cowderoyhttp://arxiv.org/abs/2412.10487v2HyperGraphOS: A Modern Meta-Operating System for the Scientific and Engineering Domains2024-12-17T10:35:33ZThis paper presents HyperGraphOS, a significant innovation in the domain of operating systems, specifically designed to address the needs of scientific and engineering domains. This platform aims to combine model-based engineering, graph modeling, data containers, and documents, along with tools for handling computational elements. HyperGraphOS functions as an Operating System offering to users an infinite workspace for creating and managing complex models represented as graphs with customizable semantics. By leveraging a web-based architecture, it requires only a modern web browser for access, allowing organization of knowledge, documents, and content into models represented in a network of workspaces. Elements of the workspace are defined in terms of domain-specific languages (DSLs). These DSLs are pivotal for navigating workspaces, generating code, triggering AI components, and organizing information and processes. The models' dual nature as both visual drawings and data structures allows dynamic modifications and inspections both interactively as well as programaticaly. We evaluated HyperGraphOS's efficiency and applicability across a large set of diverse domains, including the design and development of a virtual Avatar dialog system, a robotic task planner based on large language models (LLMs), a new meta-model for feature-based code development and many others. Our findings show that HyperGraphOS offers substantial benefits in the interaction with a computer as information system, as platoform for experiments and data analysis, as streamlined engineering processes, demonstrating enhanced flexibility in managing data, computation and documents, showing an innovative approaches to persistent desktop environments.2024-12-13T15:18:39Z29 Pages, 10 figures, 1 tableAntonello CeravolaFrank Joublinhttp://arxiv.org/abs/2412.12579v1Scaling Inter-procedural Dataflow Analysis on the Cloud2024-12-17T06:18:56ZApart from forming the backbone of compiler optimization, static dataflow analysis has been widely applied in a vast variety of applications, such as bug detection, privacy analysis, program comprehension, etc. Despite its importance, performing interprocedural dataflow analysis on large-scale programs is well known to be challenging. In this paper, we propose a novel distributed analysis framework supporting the general interprocedural dataflow analysis. Inspired by large-scale graph processing, we devise dedicated distributed worklist algorithms for both whole-program analysis and incremental analysis. We implement these algorithms and develop a distributed framework called BigDataflow running on a large-scale cluster. The experimental results validate the promising performance of BigDataflow -- BigDataflow can finish analyzing the program of millions lines of code in minutes. Compared with the state-of-the-art, BigDataflow achieves much more analysis efficiency.2024-12-17T06:18:56ZZewen SunYujin ZhangDuanchen XuYiyu ZhangYun QiYueyang WangYi LiZhaokang WangYue LiXuandong LiZhiqiang ZuoQingda LuWenwen PengShengjian Guohttp://arxiv.org/abs/2411.19410v2WDD: Weighted Delta Debugging2024-12-16T23:19:36ZDelta Debugging is a widely used family of algorithms (e.g., ddmin and ProbDD) to automatically minimize bug-triggering test inputs, thus to facilitate debugging. It takes a list of elements with each element representing a fragment of the test input, systematically partitions the list at different granularities, identifies and deletes bug-irrelevant partitions.
Prior delta debugging algorithms assume there are no differences among the elements in the list, and thus treat them uniformly during partitioning. However, in practice, this assumption usually does not hold, because the size (referred to as weight) of the fragment represented by each element can vary significantly. For example, a single element representing 50% of the test input is much more likely to be bug-relevant than elements representing only 1%. This assumption inevitably impairs the efficiency or even effectiveness of these delta debugging algorithms.
This paper proposes Weighted Delta Debugging (WDD), a novel concept to help prior delta debugging algorithms overcome the limitation mentioned above. The key insight of WDD is to assign each element in the list a weight according to its size, and distinguish different elements based on their weights during partitioning. We designed two new minimization algorithms, Wddmin and WProbDD, by applying WDD to ddmin and ProbDD respectively. We extensively evaluated Wddmin and WProbDD in two representative applications, HDD and Perses, on 62 benchmarks across two languages. The results strongly demonstrate the value of WDD. We firmly believe that WDD opens up a new dimension to improve test input minimization techniques.2024-11-28T23:31:33ZThis work has been accepted by ICSE'25In Proceedings of the IEEE/ACM 47th International Conference on Software Engineering (2025) 1592-1603Xintong ZhouZhenyang XuMengxiao ZhangYongqiang TianChengnian Sun10.1109/ICSE55347.2025.00071http://arxiv.org/abs/2409.09934v3Coordination-free Collaborative Replication based on Operational Transformation2024-12-15T23:57:19ZWe introduce Coordination-free Collaborative Replication (CCR), a new method for maintaining consistency across replicas in distributed systems without requiring explicit coordination messages. CCR automates conflict resolution, contrasting with traditional Data-sharing systems that typically involve centralized update management or predefined consistency rules.
Operational Transformation (OT), commonly used in collaborative editing, ensures consistency by transforming operations while maintaining document integrity across replicas. However, OT assumes server-based coordination, which is unsuitable for modern, decentralized Peer-to-Peer (P2P) systems.
Conflict-free Replicated Data Type (CRDT), like Two-Phase Sets (2P-Sets), guarantees eventual consistency by allowing commutative and associative operations but often result in counterintuitive behaviors, such as failing to re-add an item to a shopping cart once removed.
In contrast, CCR employs a more intuitive approach to replication. It allows for straightforward updates and conflict resolution based on the current data state, enhancing clarity and usability compared to CRDTs. Furthermore, CCR addresses inefficiencies in messaging by developing a versatile protocol based on data stream confluence, thus providing a more efficient and practical solution for collaborative data sharing in distributed systems.2024-09-16T02:17:58ZMasato Takeichihttp://arxiv.org/abs/2408.11283v2Inference Plans for Hybrid Particle Filtering2024-12-14T16:19:30ZAdvanced probabilistic programming languages (PPLs) using hybrid particle filtering combine symbolic exact inference and Monte Carlo methods to improve inference performance. These systems use heuristics to partition random variables within the program into variables that are encoded symbolically and variables that are encoded with sampled values, and the heuristics are not necessarily aligned with the developer's performance evaluation metrics. In this work, we present inference plans, a programming interface that enables developers to control the partitioning of random variables during hybrid particle filtering. We further present Siren, a new PPL that enables developers to use annotations to specify inference plans the inference system must implement. To assist developers with statically reasoning about whether an inference plan can be implemented, we present an abstract-interpretation-based static analysis for Siren for determining inference plan satisfiability. We prove the analysis is sound with respect to Siren's semantics. Our evaluation applies inference plans to three different hybrid particle filtering algorithms on a suite of benchmarks. It shows that the control provided by inference plans enables speed ups of 1.76x on average and up to 206x to reach a target accuracy, compared to the inference plans implemented by default heuristics; the results also show that inference plans improve accuracy by 1.83x on average and up to 595x with less or equal runtime, compared to the default inference plans. We further show that our static analysis is precise in practice, identifying all satisfiable inference plans in 27 out of the 33 benchmark-algorithm evaluation settings.2024-08-21T02:07:03Zv2: camera-ready versionProc. ACM Program. Lang., Vol. 9, No. POPL, Article 10. Publication date: January 2025Ellie Y. ChengEric AtkinsonGuillaume BaudartLouis MandelMichael Carbin10.1145/3704846http://arxiv.org/abs/2404.18249v3Tenspiler: A Verified Lifting-Based Compiler for Tensor Operations (Extended Version)2024-12-14T08:29:46ZTensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort.
To enable developers in leveraging such DSLs while preserving their current programming paradigm, we introduce Tenspiler, a verified lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation.
Currently, Tenspiler already supports $\textbf{six}$ DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on $\textbf{6}$ different software frameworks and hardware devices, Tenspiler offers on average 105$\times$ kernel and 9.65$\times$ end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.2024-04-28T17:10:17ZJie QiuColin CaiSahil BhatiaNiranjan HasabnisSanjit A. SeshiaAlvin Cheunghttp://arxiv.org/abs/2412.10657v1Probabilistic Guarantees for Practical LIA Loop Invariant Automation2024-12-14T02:59:32ZDespite the crucial need for formal safety and security verification of programs, discovering loop invariants remains a significant challenge. Static analysis is a primary technique for inferring loop invariants but often relies on substantial assumptions about underlying theories. Data-driven methods supported by dynamic analysis and machine learning algorithms have shown impressive performance in inferring loop invariants for some challenging programs. However, state-of-the-art data-driven techniques do not offer theoretical guarantees for finding loop invariants. We present a novel technique that leverages the simulated annealing (SA) search algorithm combined with SMT solvers and computational geometry to provide probabilistic guarantees for inferring loop invariants using data-driven methods. Our approach enhances the SA search with real analysis to define the search space and employs parallelism to increase the probability of success. To ensure the convergence of our algorithm, we adapt e-nets, a key concept from computational geometry. Our tool, DLIA2, implements these algorithms and demonstrates competitive performance against state-of-the-art techniques. We also identify a subclass of programs, on which we outperform the current state-of-the-art tool GSpacer.2024-12-14T02:59:32Z52 pages, 27 figures, conferenceAshish KumarThe Pennsylvania State University, USAJilaun ZhangThe Pennsylvania State University, USASaeid Tizpaz-NiariUT El Paso, USAGang TanThe Pennsylvania State University, USA