Portability of Fortran's `do concurrent' on GPUs

2024-12-23T22:03:41Z

There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown.

Reimplementing Mizar in Rust

2024-12-23T19:24:33Z

This paper describes a new open-source proof processing tool, mizar-rs, a wholesale reimplementation of core parts of the Mizar proof system, written in Rust. In particular, the "checker" and "analyzer" of Mizar are implemented, which together form the trusted core of Mizar. This is to our knowledge the first and only external implementation of these components. Thanks to the loose coupling of Mizar's passes, it is possible to use the checker as a drop-in replacement for the original, and we have used this to verify the entire MML in 11.8 minutes on 8 cores, a 4.8x speedup over the original Pascal implementation. Since Mizar is not designed to have a small trusted core, checking Mizar proofs entails following Mizar closely, so our ability to detect bugs is limited. Nevertheless, we were able to find multiple memory errors, four soundness bugs in the original (which were not being exploited in MML), in addition to one non-critical bug which was being exploited in 46 different MML articles. We hope to use this checker as a base for proof export tooling, as well as revitalizing development of the language.

Modular Probabilistic Models via Algebraic Effects

2024-12-23T17:16:43Z

Probabilistic programming languages (PPLs) allow programmers to construct statistical models and then simulate data or perform inference over them. Many PPLs restrict models to a particular instance of simulation or inference, limiting their reusability. In other PPLs, models are not readily composable. Using Haskell as the host language, we present an embedded domain specific language based on algebraic effects, where probabilistic models are modular, first-class, and reusable for both simulation and inference. We also demonstrate how simulation and inference can be expressed naturally as composable program transformations using algebraic effect handlers.

Effect Handlers for Programmable Inference

2024-12-23T17:13:46Z

Inference algorithms for probabilistic programming are complex imperative programs with many moving parts. Efficient inference often requires customising an algorithm to a particular probabilistic model or problem, sometimes called inference programming. Most inference frameworks are implemented in languages that lack a disciplined approach to side effects, which can result in monolithic implementations where the structure of the algorithms is obscured and inference programming is hard. Functional programming with typed effects offers a more structured and modular foundation for programmable inference, with monad transformers being the primary structuring mechanism explored to date. This paper presents an alternative approach to inference programming based on algebraic effects. Using effect signatures to specify the key operations of the algorithms, and effect handlers to modularly interpret those operations for specific variants, we develop two abstract algorithms, or inference patterns, representing two important classes of inference: Metropolis-Hastings and particle filtering. We show how our approach reveals the algorithms' high-level structure, and makes it easy to tailor and recombine their parts into new variants. We implement the two inference patterns as a Haskell library, and discuss the pros and cons of algebraic effects vis-a-vis monad transformers as a structuring mechanism for modular imperative algorithm design.

EcoSearch: A Constant-Delay Best-First Search Algorithm for Program Synthesis

2024-12-23T06:48:47Z

Many approaches to program synthesis perform a combinatorial search within a large space of programs to find one that satisfies a given specification. To tame the search space blowup, previous works introduced probabilistic and neural approaches to guide this combinatorial search by inducing heuristic cost functions. Best-first search algorithms ensure to search in the exact order induced by the cost function, significantly reducing the portion of the program space to be explored. We present a new best-first search algorithm called EcoSearch, which is the first constant-delay algorithm for pre-generation cost function: the amount of compute required between outputting two programs is constant, and in particular does not increase over time. This key property yields important speedups: we observe that EcoSearch outperforms its predecessors on two classic domains.

Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

2024-12-21T18:49:38Z

Despite extensive usage in high-performance, low-level systems programming applications, C is susceptible to vulnerabilities due to manual memory management and unsafe pointer operations. Rust, a modern systems programming language, offers a compelling alternative. Its unique ownership model and type system ensure memory safety without sacrificing performance. In this paper, we present Syzygy, an automated approach to translate C to safe Rust. Our technique uses a synergistic combination of LLM-driven code and test translation guided by dynamic-analysis-generated execution information. This paired translation runs incrementally in a loop over the program in dependency order of the code elements while maintaining per-step correctness. Our approach exposes novel insights on combining the strengths of LLMs and dynamic analysis in the context of scaling and combining code generation with testing. We apply our approach to successfully translate Zopfli, a high-performance compression library with ~3000 lines of code and 98 functions. We validate the translation by testing equivalence with the source C program on a set of inputs. To our knowledge, this is the largest automated and test-validated C to safe Rust code translation achieved so far.

Worst-Case Input Generation for Concurrent Programs under Non-Monotone Resource Metrics

2024-12-21T09:38:39Z

Worst-case input generation aims to automatically generate inputs that exhibit the worst-case performance of programs. It has several applications, and can, for example, detect vulnerabilities to denial-of-service (DoS) attacks. However, it is non-trivial to generate worst-case inputs for concurrent programs, particularly for resources like memory where the peak cost depends on how processes are scheduled. This article presents the first sound worst-case input generation algorithm for concurrent programs under non-monotone resource metrics like memory. The key insight is to leverage resource-annotated session types and symbolic execution. Session types describe communication protocols on channels in process calculi. Equipped with resource annotations, resource-annotated session types not only encode cost bounds but also indicate how many resources can be reused and transferred between processes. This information is critical for identifying a worst-case execution path during symbolic execution. The algorithm is sound: if it returns any input, it is guaranteed to be a valid worst-case input. The algorithm is also relatively complete: as long as resource-annotated session types are sufficiently expressive and the background theory for SMT solving is decidable, a worst-case input is guaranteed to be returned. A simple case study of a web server's memory usage demonstrates the utility of the worst-case input generation algorithm.

Formal Foundations for Translational Separation Logic Verifiers (extended version)

2024-12-20T17:44:08Z

Program verification tools are often implemented as front-end translations of an input program into an intermediate verification language (IVL) such as Boogie, GIL, Viper, or Why3. The resulting IVL program is then verified using an existing back-end verifier. A soundness proof for such a translational verifier needs to relate the input program and verification logic to the semantics of the IVL, which in turn needs to be connected with the verification logic implemented in the back-end verifiers. Performing such proofs is challenging due to the large semantic gap between the input and output programs and logics, especially for complex verification logics such as separation logic. This paper presents a formal framework for reasoning about translational separation logic verifiers. At its center is a generic core IVL that captures the essence of different separation logics. We define its operational semantics and formally connect it to two different back-end verifiers, which use symbolic execution and verification condition generation, resp. Crucially, this semantics uses angelic non-determinism to enable the application of different proof search algorithms and heuristics in the back-end verifiers. An axiomatic semantics for the core IVL simplifies reasoning about the front-end translation by performing essential proof steps once and for all in the equivalence proof with the operational semantics rather than for each concrete front-end translation. We illustrate the usefulness of our formal framework by instantiating our core IVL with elements of Viper and connecting it to two Viper back-ends as well as a front-end for concurrent separation logic. All our technical results have been formalized in Isabelle/HOL, including the core IVL and its semantics, the semantics of two back-ends for a subset of Viper, and all proofs.

The State of Julia for Scientific Machine Learning

2024-12-20T08:17:23Z

Julia has been heralded as a potential successor to Python for scientific machine learning and numerical computing, boasting ergonomic and performance improvements. Since Julia's inception in 2012 and declaration of language goals in 2017, its ecosystem and language-level features have grown tremendously. In this paper, we take a modern look at Julia's features and ecosystem, assess the current state of the language, and discuss its viability and pitfalls as a replacement for Python as the de-facto scientific machine learning language. We call for the community to address Julia's language-level issues that are preventing further adoption.

Relaxed exception semantics for Arm-A (extended version)

2024-12-19T18:24:42Z

To manage exceptions, software relies on a key architectural guarantee, precision: that exceptions appear to execute between instructions. However, this definition, dating back over 60 years, fundamentally assumes a sequential programmers model. Modern architectures such as Arm-A with programmer-observable relaxed behaviour make such a naive definition inadequate, and it is unclear exactly what guarantees programmers have on exception entry and exit. In this paper, we clarify the concepts needed to discuss exceptions in the relaxed-memory setting -- a key aspect of precisely specifying the architectural interface between hardware and software. We explore the basic relaxed behaviour across exception boundaries, and the semantics of external aborts, using Arm-A as a representative modern architecture. We identify an important problem, present yet unexplored for decades: pinning down what it means for exceptions to be precise in a relaxed setting. We describe key phenomena that any definition should account for. We develop an axiomatic model for Arm-A precise exceptions, tooling for axiomatic model execution, and a library of tests. Finally we explore the relaxed semantics of software-generated interrupts, as used in sophisticated programming patterns, and sketch how they too could be modelled.

Investigating the Energy Consumption of C++ and Java Solutions Mined from a Programming Contest Site

2024-12-19T14:14:34Z

The concern about global warming increased the interest in the energy efficiency of computer applications. Assuming power is constant, the general trend is that faster programs consume less energy, thus optimizing a program for speed would also improve its energy efficiency. We investigate this tendency in a set of C++ and Java solutions mined from Code Submission Evaluation System (CSES), a popular programming competition site, where each solution must give the correct answer under a given time limit. In such context, we can consider that all correct solutions for a problem were written with a speed concern, but not with energy efficiency in mind. We selected 15 problems from CSES and for each of them we mined at least 30 C++ and Java solutions, evaluating time and energy efficiency of each solution in at least two different machines. In our scenario, where there is a great diversity of programming styles, execution speed, and memory usage, we could confirm the general trend: faster programs consume less energy. Moreover, we were able to use ordinary least squares to fit a linear function, with good precision, that relates energy consumption of a program based on its execution time, as well as to automatically identify programs with abnormal energy consumption. A manual analysis of these programs revealed that often they perform a different amount of allocation and deallocation operations when compared to programs with similar execution times. We also calculated the energy consumption profile of sets of random C++ solutions for these 15 CSES problems, and we tried to associate each set with its corresponding CSES problem by using the energy consumption profiles previously computed for each one of them. With this approach, we restricted, for each set of random C++ solutions, the classification task to a subset of 7 CSES problems, a reduction of more than 50% in the search space.

Cage: Hardware-Accelerated Safe WebAssembly

2024-12-19T12:38:38Z

WebAssembly (WASM) is an immensely versatile and increasingly popular compilation target. It executes applications written in several languages (e.g., C/C++) with near-native performance in various domains (e.g., mobile, edge, cloud). Despite WASM's sandboxing feature, which isolates applications from other instances and the host platform, WASM does not inherently provide any memory safety guarantees for applications written in low-level, unsafe languages. To this end, we propose Cage, a hardware-accelerated toolchain for WASM that supports unmodified applications compiled to WASM and utilizes diverse Arm hardware features aiming to enrich the memory safety properties of WASM. Precisely, Cage leverages Arm's Memory Tagging Extension (MTE) to (i) provide spatial and temporal memory safety for heap and stack allocations and (ii) improve the performance of WASM's sandboxing mechanism. Cage further employs Arm's Pointer Authentication (PAC) to prevent leaked pointers from being reused by other WASM instances, thus enhancing WASM's security properties. We implement our system based on 64-bit WASM. We provide a WASM compiler and runtime with support for Arm's MTE and PAC. On top of that, Cage's LLVM-based compiler toolchain transforms unmodified applications to provide spatial and temporal memory safety for stack and heap allocations and prevent function pointer reuse. Our evaluation on real hardware shows that Cage incurs minimal runtime (<5.8%) and memory (<3.7%) overheads and can improve the performance of WASM's sandboxing mechanism, achieving a speedup of over 5.1%, while offering efficient memory safety guarantees.

A flexible specification approach for verifying total correctness of fine-grained concurrent modules

2024-12-19T10:04:42Z

A well-established approach to proving progress properties such as deadlock-freedom and termination is to associate obligations with threads. For example, in most existing work the proof rule for lock acquisition prescribes a standard usage protocol by burdening the acquiring thread with an obligation to release the lock. The fact that the obligation creation is hardcoded into the acquire operation, however, rules out non-standard clients e.g. where the release happens in a different thread. We overcome this limitation by instead having the blocking operations take the obligation creation operations required for the specific client scenario as arguments. We dub this simple instance of higher-order programming with auxiliary code Sassy. To illustrate Sassy, we extend HeapLang, a simple, higher-order, concurrent programming language with erasable code and state. The resulting language gets stuck if no progress is made. Consequently, we can apply standard safety separation logic to compositionally reason about termination in a fine-grained concurrent setting. We validated Sassy by developing (non-foundational) machine-checked proofs of representative locks -- an unfair Spinlock (competitive succession), a fair Ticketlock (direct handoff succession) and the hierarchically constructed Cohortlock that is starvation-free if the underlying locks are starvation-free -- against our specifications using an encoding of the approach in the VeriFast program verifier for C and Java.

Relational Programming with Foundation Models

2024-12-19T04:26:45Z

Foundation models have vast potential to enable diverse AI applications. The powerful yet incomplete nature of these models has spurred a wide range of mechanisms to augment them with capabilities such as in-context learning, information retrieval, and code interpreting. We propose Vieira, a declarative framework that unifies these mechanisms in a general solution for programming with foundation models. Vieira follows a probabilistic relational paradigm and treats foundation models as stateless functions with relational inputs and outputs. It supports neuro-symbolic applications by enabling the seamless combination of such models with logic programs, as well as complex, multi-modal applications by streamlining the composition of diverse sub-models. We implement Vieira by extending the Scallop compiler with a foreign interface that supports foundation models as plugins. We implement plugins for 12 foundation models including GPT, CLIP, and SAM. We evaluate Vieira on 9 challenging tasks that span language, vision, and structured and vector databases. Our evaluation shows that programs in Vieira are concise, can incorporate modern foundation models, and have comparable or better accuracy than competitive baselines.

Towards Provable Security in Industrial Control Systems Via Dynamic Protocol Attestation

2024-12-19T02:28:35Z

Industrial control systems (ICSs) increasingly rely on digital technologies vulnerable to cyber attacks. Cyber attackers can infiltrate ICSs and execute malicious actions. Individually, each action seems innocuous. But taken together, they cause the system to enter an unsafe state. These attacks have resulted in dramatic consequences such as physical damage, economic loss, and environmental catastrophes. This paper introduces a methodology that restricts actions using protocols. These protocols only allow safe actions to execute. Protocols are written in a domain specific language we have embedded in an interactive theorem prover (ITP). The ITP enables formal, machine-checked proofs to ensure protocols maintain safety properties. We use dynamic attestation to ensure ICSs conform to their protocol even if an adversary compromises a component. Since protocol conformance prevents unsafe actions, the previously mentioned cyber attacks become impossible. We demonstrate the effectiveness of our methodology using an example from the Fischertechnik Industry 4.0 platform. We measure dynamic attestation's impact on latency and throughput. Our approach is a starting point for studying how to combine formal methods and protocol design to thwart attacks intended to cripple ICSs.