https://arxiv.org/api/uzG6lzS6P0Bd8woHGjvTmiV2qrc2026-06-26T18:05:25Z9951133515http://arxiv.org/abs/2403.04618v4Strong Priority and Determinacy in Timed CCS2025-08-06T16:33:12ZBuilding on the standard theory of process algebra with priorities, we identify a new scheduling mechanism, called "constructive reduction" which is designed to capture the essence of synchronous programming. The distinctive property of this evaluation strategy is to achieve determinacy-by-construction for multi-cast concurrent communication with shared memory. In the technical setting of CCS extended by clocks and priorities, we prove for a large class of "coherent" processes a confluence property for constructive reductions. We show that under some restrictions, called "pivotability", coherence is preserved by the operators of prefix, summation, parallel composition, restriction and hiding. Since this permits memory and sharing, we are able to cover a strictly larger class of processes compared to those in Milner's classical confluence theory for CCS without priorities.2024-03-07T16:02:31ZChange Notes (06.08.25): Streamlined the definition of coherence and non-interference; Corrections in Def.~14 for coherence, adding condition on residual transitions; Adjusted coding of Esterel signals (Ex.~11) to match adjusted Def.~14; To reflect changed Def.~14, use the term "c-coherence''; Minor rewrite of Sec.~2.3 and Sec.~4; Further corrections and revisions in AppendicesLuigi LiquoriMichael Mendlerhttp://arxiv.org/abs/2312.08617v5RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightweight Solution2025-08-06T13:02:31ZThe automatic generation of RTL code (e.g., Verilog) using natural language instructions and large language models (LLMs) has attracted significant research interest recently. However, most existing approaches heavily rely on commercial LLMs such as ChatGPT, while open-source LLMs tailored for this specific design generation task exhibit notably inferior performance. The absence of high-quality open-source solutions restricts the flexibility and data privacy of this emerging technique. In this study, we present a new customized LLM solution with a modest parameter count of only 7B, achieving better performance than GPT-3.5 on all representative benchmarks for RTL code generation. Especially, it outperforms GPT-4 in VerilogEval Machine benchmark. This remarkable balance between accuracy and efficiency is made possible by leveraging our new RTL code dataset and a customized LLM algorithm, both of which have been made fully open-source.2023-12-14T02:42:15ZThis is the LAD Conference version of RTLCoder, for the TCAD extension version, please refer to: arXiv:2312.08617v4Shang LiuWenji FangYao LuQijun ZhangHongce ZhangZhiyao Xie10.1109/LAD62341.2024.10691788http://arxiv.org/abs/2506.11794v2ALEA IACTA EST: A Declarative Domain-Specific Language for Manually Performable Random Experiments2025-08-06T08:32:43ZRandom experiments that are simple and clear enough to be performed by human agents feature prominently in the teaching of elementary stochastics as well as in games. We present Alea, a domain-specific language for the specification of random experiments. Alea code can either be analyzed statically to obtain and inspect probability distributions of outcomes, or be executed with a source pseudo-randomness for simulation or as a game assistant. The language is intended for ease of use by non-expert programmers, in particular students of elementary stochastics, and players and designers of games of chance, by focusing on concepts common to functional programming and basic mathematics. Both the design of the language and the implementation of runtime environments are work in progress.
2025-06-13T13:58:26ZIn Proceedings TFPiE 2025, arXiv:2508.02305EPTCS 424, 2025, pp. 67-86Baltasar Trancón y WidemannBrandenburg University of Applied SciencesMarkus Leppersemantics gGmbH10.4204/EPTCS.424.4http://arxiv.org/abs/2508.03978v1Raqlet: Cross-Paradigm Compilation for Recursive Queries2025-08-06T00:07:00ZWe introduce Raqlet, a source-to-source compilation framework that addresses the fragmentation of recursive querying engines spanning relational (recursive SQL), graph (Cypher, GQL), and deductive (Datalog) systems. Recent standards such as SQL:2023's SQL/PGQ and the GQL standard provide a common foundation for querying graph data within relational and graph databases; however, real-world support remains inconsistent across systems. Raqlet bridges this gap by translating recursive queries across paradigms through leveraging intermediate representations (IRs) grounded in well-defined semantics; it translates Cypher or SQL/PGQ to PGIR (inspired by Cypher), then into DLIR (inspired by Datalog), and finally to SQIR (inspired by recursive SQL). Raqlet provides a shared semantic basis that can serve as a golden reference implementation for language standards, while supporting static analysis and transformations (e.g., magic-set transformation) for performance tuning. Our vision is to make Raqlet a robust platform that enables rapid cross-paradigm prototyping, portable recursive queries, and formal reasoning about recursion even when targeting diverse query execution engines.2025-08-06T00:07:00ZAmir ShaikhhaYouning XiaMeisam TarabkhahJazal SaleemAnna Herlihyhttp://arxiv.org/abs/2508.03832v1Generating Inputs for Grammar Mining using Dynamic Symbolic Execution2025-08-05T18:21:43ZA vast number of software systems include components that parse and process structured input. In addition to programming languages, which are analyzed by compilers or interpreters, there are numerous components that process standardized or proprietary data formats of varying complexity. Even if such components were initially developed and tested based on a specification, such as a grammar, numerous modifications and adaptations over the course of software evolution can make it impossible to precisely determine which inputs they actually accept.
In this situation, grammar mining can be used to reconstruct the specification in the form of a grammar. Established approaches already produce useful results, provided that sufficient input data is available to fully cover the input language. However, achieving this completeness is a major challenge. In practice, only input data recorded during the operation of the software systems is available. If this data is used for grammar mining, the resulting grammar reflects only the actual processed inputs but not the complete grammar of the input language accepted by the software component. As a result, edge cases or previously supported features that no longer appear in the available input data are missing from the generated grammar.
This work addresses this challenge by introducing a novel approach for the automatic generation of inputs for grammar mining. Although input generators have already been used for fuzz testing, it remains unclear whether they are also suitable for grammar miners. Building on the grammar miner Mimid, this work presents a fully automated approach to input generation. The approach leverages Dynamic Symbolic Execution (DSE) and extends it with two mechanisms to overcome the limitations of DSE regarding structured input parsers. First, the search for new inputs is guided by an iterative expansion that starts with a single-character input and gradually extends it. Second, input generation is structured into a novel three-phase approach, which separates the generation of inputs for parser functions.
The proposed method was evaluated against a diverse set of eleven benchmark applications from the existing literature. Results demonstrate that the approach achieves precision and recall for extracted grammars close to those derived from state-of-the-art grammar miners such as Mimid. Notably, it successfully uncovers subtle features and edge cases in parsers that are typically missed by such grammar miners. The effectiveness of the method is supported by empirical evidence, showing that it can achieve high performance in various domains without requiring prior input samples.
This contribution is significant for researchers and practitioners in software engineering, offering an automated, scalable, and precise solution for grammar mining. By eliminating the need for manual input generation, the approach not only reduces workload but also enhances the robustness and comprehensiveness of the extracted grammars. Following this approach, software engineers can reconstruct specification from existing (legacy) parsers.2025-08-05T18:21:43ZThe Art, Science, and Engineering of Programming, 2025, Vol. 10, Issue 2, Article 16Andreas PointnerUniversity of Applied Sciences Upper Austria, AustriaJosef PichlerUniversity of Applied Sciences Upper Austria, AustriaHerbert PrähoferJohannes Kepler University Linz, Austria10.22152/programming-journal.org/2025/10/16http://arxiv.org/abs/2508.03831v1A Type System for Data Privacy Compliance in Active Object Languages2025-08-05T18:21:28ZData protection laws such as GDPR aim to give users unprecedented control over their personal data. Compliance with these regulations requires systematically considering information flow and interactions among entities handling sensitive data. Privacy-by-design principles advocate embedding data protection into system architectures as a default. However, translating these abstract principles into concrete, explicit methods remains a significant challenge. This paper addresses this gap by proposing a language-based approach to privacy integration, combining static and runtime techniques. By employing type checking and type inference in an active object language, the framework enables the tracking of authorised data flows and the automatic generation of constraints checked at runtime based on user consent. This ensures that personal data is processed in compliance with GDPR constraints. The key contribution of this work is a type system that gather the compliance checks and the changes to users consent and integrates data privacy compliance verification into system execution. The paper demonstrates the feasibility of this approach through a soundness proof and several examples, illustrating how the proposed language addresses common GDPR requirements, such as user consent, purpose limitation, and data subject rights. This work advances the state of the art in privacy-aware system design by offering a systematic and automated method for integrating GDPR compliance into programming languages. This capability has implications for building trustworthy systems in domains such as healthcare or finance, where data privacy is crucial.2025-08-05T18:21:28ZThe Art, Science, and Engineering of Programming, 2025, Vol. 10, Issue 2, Article 18Chinmayi Prabhu BaramashetruUniversity of Oslo, NorwayPaola GianniniUniversita' del Piemonte Orientale, ItalySilvia Lizeth Tapia TarifaUniversity of Oslo, NorwayOlaf OweUniversity of Oslo, Norway10.22152/programming-journal.org/2025/10/18http://arxiv.org/abs/2508.03830v1If-T: A Benchmark for Type Narrowing2025-08-05T18:21:13Z**Context:**
The design of static type systems that can validate dynamically-typed programs (**gradually**) is an ongoing challenge. A key difficulty is that dynamic code rarely follows datatype-driven design. Programs instead use runtime tests to narrow down the proper usage of incoming data. Type systems for dynamic languages thus need a **type narrowing** mechanism that refines the type environment along individual control paths based on dominating tests, a form of flow-sensitive typing. In order to express refinements, the type system must have some notion of sets and subsets. Since set-theoretic types are computationally and ergonomically complex, the need for type narrowing raises design questions about how to balance precision and performance.
**Inquiry:**
To date, the design of type narrowing systems has been driven by intuition, past experience, and examples from users in various language communities. There is no standard that captures desirable and undesirable behaviors. Prior formalizations of narrowing are also significantly more complex than a standard type system, and it is unclear how the extra complexity pays off in terms of concrete examples. This paper addresses the problems through If-T, a language-agnostic **design benchmark** for type narrowing that characterizes the abilities of implementations using simple programs that draw attention to fundamental questions. Unlike a traditional performance-focused benchmark, If-T measures a narrowing system's ability to validate correct code and reject incorrect code. Unlike a test suite, systems are not required to fully conform to If-T. Deviations are acceptable provided they are justified by well-reasoned design considerations, such as compile-time performance.
**Approach:**
If-T is guided by the literature on type narrowing, the documentation of gradual languages such as TypeScript, and experiments with typechecker implementations. We have identified a set of core technical dimensions for type narrowing. For each dimension, the benchmark contains a set of topics and (at least) two characterizing programs per topic: one that should typecheck and one that should not typecheck.
**Knowledge:**
If-T provides a baseline to measure type narrowing systems. For researchers, it provides criteria to categorize future designs via its collection of positive and negative examples. For language designers, the benchmark demonstrates the payoff of typechecker complexity in terms of concrete examples. Designers can use the examples to decide whether supporting a particular example is worthwhile. Both the benchmark and its implementations are freely available online.
**Grounding:**
We have implemented the benchmark for five typecheckers: TypeScript, Flow, Typed Racket, mypy, and Pyright. The results highlight important differences, such as the ability to track logical implications among program variables and typechecking for user-defined narrowing predicates.
**Importance:**
Type narrowing is essential for gradual type systems, but the tradeoffs between systems with different complexity have been unclear. If-T clarifies these tradeoffs by illustrating the benefits and limitations of each level of complexity. With If-T as a way to assess implementations in a fair, cross-language manner, future type system designs can strive for a better balance among precision, annotation burden, and performance.2025-08-05T18:21:13ZThe Art, Science, and Engineering of Programming, 2025, Vol. 10, Issue 2, Article 17Hanwen GuoUniversity of Utah, USABen GreenmanUniversity of Utah, USA10.22152/programming-journal.org/2025/10/17http://arxiv.org/abs/2508.03678v1More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation2025-08-05T17:49:48ZState-of-the-art Large Language Models (LLMs) achieve high pass@1 on general benchmarks like HumanEval but underperform on specialized suites such as ParEval. Is this due to LLMs missing domain knowledge or insufficient prompt detail is given? To answer this, we introduce PartialOrderEval, which augments any code generation benchmark with a partial order of prompts from minimal to maximally detailed. Applying it to HumanEval and both serial and OpenMP subsets of ParEval, we measure how pass@1 scales with prompt specificity. Our experiments with Llama-3.x and Qwen2.5-Coder demonstrate varying degrees of prompt sensitivity across different tasks, and a qualitative analysis highlights explicit I/O specifications, edge-case handling, and stepwise breakdowns as the key drivers of prompt detail improvement.2025-08-05T17:49:48ZYangtian ZiHarshitha MenonArjun Guhahttp://arxiv.org/abs/2508.03641v1Visual Execution and Validation of Finite-State Machines and Pushdown Automata2025-08-05T16:54:01ZIn Formal Languages and Automata Theory courses, students find understanding nondeterministic finite-state and pushdown automata difficult. In many cases, this means that it is challenging for them to comprehend the operational semantics of such machines and, as a consequence, determine why a word is accepted or rejected. This is not entirely surprising, because students are mostly trained to design and implement deterministic programs. Comprehension of pushdown automata is further complicated, because reasoning about the stack is necessary. A common difficulty students face, for example, is understanding that two different computations on the same word may reach the same state with different stack values. To aid student understanding, we present two novel dynamic visualization tools for FSM -- a domain-specific programming language for the Automata Theory classroom -- to support the design of such machines. These tools visualize all computations that may be performed, respectively, by a nondeterministic finite-state machine or by a pushdown automata in a stepwise manner. In addition, these tools aid the machine verification process by allowing users to visually validate whether the properties a state represents hold when a machine transitions into it.2025-08-05T16:54:01ZIn Proceedings TFPiE 2025, arXiv:2508.02305EPTCS 424, 2025, pp. 87-108Marco T. MorazánSeton Hall UniversityDavid Anthony K. FieldsSeton Hall UniversityAndrés M. GarcedSeton Hall UniversityTijana MinićUniversity of Washington10.4204/EPTCS.424.5http://arxiv.org/abs/2508.03640v1Teaching Introductory Functional Programming Using Haskelite2025-08-05T16:53:48ZLearning functional programming requires learning a substitution-based computational model. While substitution should be a familiar concept from high-school algebra, students often have difficulty applying it to new settings, such as recursive definitions, algebraic data types and higher-order functions. Step-by-step interpreters have been shown to help beginners by clarifying misconceptions and improving understanding.
This paper reports on the experience of using a step-by-step tracing interpreter for a subset of Haskell while teaching an introductory functional programming course at the University of Porto. We describe the use of the interpreter, present some feedback obtained from students, reflect on the lessons learned and point directions for further work.
2025-08-05T16:53:48ZIn Proceedings TFPiE 2025, arXiv:2508.02305EPTCS 424, 2025, pp. 49-66Pedro VasconcelosUniversity of Porto10.4204/EPTCS.424.3http://arxiv.org/abs/2508.03639v1A Design Recipe and Recipe-Based Errors for Regular Expressions2025-08-05T16:53:33ZThis article presents a novel framework to provide Formal Languages and Automata Theory students design support for the development of regular expressions. This framework includes a design recipe for regular expressions and a customized error messaging system. The error messaging system produces recipe-based errors that include the step of the design recipe not successfully completed. Furthermore, the error messages follow the established practices of being concise, succinct, jargon-free, and nonprescriptive. In addition, a shorthand syntax developed for writing unit tests is described. The in-class use of the design recipe is illustrated, two debugging sessions using the described system are discussed, and the implementation of the error messaging system is briefly sketched.2025-08-05T16:53:33ZIn Proceedings TFPiE 2025, arXiv:2508.02305EPTCS 424, 2025, pp. 25-48Marco T. MorazánSeton Hall UniversityShamil DzhatdoyevAxoni, USAJosephine Des RosiersPenguin Random HouseTijana MinićUniversity of WashingtonAndrés M. GarcedSeton Hall UniversityDavid Anthony K. FieldsSeton Hall University10.4204/EPTCS.424.2http://arxiv.org/abs/2508.03638v1Design Support for Multitape Turing Machines2025-08-05T16:53:19ZMany Formal Languages and Automata Theory courses introduce students to Turing machine extensions. One of the most widely-used extensions endows Turing machines with multiple tapes. Although multitape Turing machines are an abstraction to simplify Turing machine design, students find them no less challenging. To aid students in understanding these machines, the FSM programming language provides support for their definition and execution. This, however, has proven insufficient for many students to understand the operational semantics of such machines and to understand why such machines accept or reject a word. To address this problem, three visualization tools have been developed. The first is a dynamic visualization tool that simulates machine execution. The second is a static visualization tool that automatically renders a graphic for a multitape Turing machine's transition diagram. The third is a static visualization tool that automatically renders computation graphs for multitape Turing machines. This article presents these tools and illustrates how they are used to help students design and implement multitape Turing machines. In addition, empirical data is presented that suggests these tools are well-received and found useful by students.2025-08-05T16:53:19ZIn Proceedings TFPiE 2025, arXiv:2508.02305EPTCS 424, 2025, pp. 1-24Marco T. MorazánSeton Hall UniversityOliwia KempinskiUniversity of MarylandAndrés M. GarcedSeton Hall University10.4204/EPTCS.424.1http://arxiv.org/abs/2508.03603v1ReFuzzer: Feedback-Driven Approach to Enhance Validity of LLM-Generated Test Programs2025-08-05T16:17:02ZExisting LLM-based compiler fuzzers often produce syntactically or semantically invalid test programs, limiting their effectiveness in exercising compiler optimizations and backend components. We introduce ReFuzzer, a framework for refining LLM-generated test programs by systematically detecting and correcting compilation and runtime violations (e.g. division by zero or array out-of-bounds accesses). ReFuzzer employs a feedback loop with a local LLM to validate and filter erroneous programs before execution, improving fuzzing effectiveness beyond crash detection and enabling the generation of diverse yet valid test programs.
We evaluated ReFuzzer's effectiveness across black-, grey- and white-box fuzzing approaches targeting LLVM/Clang. ReFuzzer improved test programs' validity from 47.0-49.4% to 96.6-97.3%, with an average processing time of 2.9-3.5 s per test program on a dual-GPU machine. Further, refuzzing significantly increased code coverage in critical optimization and IR generation components. For example, vectorization coverage had an absolute improvement of 9.2%, 2.3%, and 7.1% in black-, grey-, and white-box fuzzing, enhancing testing effectiveness.2025-08-05T16:17:02ZIti ShreeKarine Even-MendozaTomasz Radzikhttp://arxiv.org/abs/2508.03558v1SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation2025-08-05T15:28:13ZIn today's rapidly evolving field of electronic design automation (EDA), the complexity of hardware designs is increasing, necessitating more sophisticated automation solutions. High-level synthesis (HLS), as a pivotal solution, automates hardware designs from high-level abstractions (e.g., C/C++). However, it faces significant challenges, particularly in design space exploration and optimization. While large language models (LLMs) have shown notable capabilities in code generation, their application to HLS has been limited due to the scarcity of (publicly) available HLS code datasets. Hence, research in this domain has primarily focused on techniques such as prompt engineering and retrieval-augmented generation (RAG). To overcome this limitation, this paper introduces SAGE-HLS, the first-of-its-kind fine-tuned LLM specifically for HLS code generation. Our method includes three key advancements: (i) We implement Verilog-to-C/C++ porting, converting verified and synthesizable Verilog codes into corresponding C, creating a dataset of 16.7K HLS codes; (ii) We implement a fine-tuning strategy, which is based on instruction prompting to code generation guided by abstract syntax tree (AST); (iii) We develop a semi-automated evaluation framework using VerilogEval to assess the functionality of the generated HLS code. Our experiments show that SAGE-HLS, fined-tuned on the QwenCoder (2.5) 7B model, achieves a near 100% success rate in code synthesizability and a 75% success rate in functional correctness.2025-08-05T15:28:13ZAccepted to the IEEE International Conference on Computer Design (ICCD 2025)M Zafir Sadik KhanNowfel MashnoorMohammad AkyashKimia AzarHadi Kamalihttp://arxiv.org/abs/2508.02820v1Automated Code Repair for C/C++ Static Analysis Alerts2025-08-04T18:44:50Z(Note: This work is a preprint.) Static analysis (SA) tools produce many diagnostic alerts indicating that source code in C or C++ may be defective and potentially vulnerable to security exploits. Many of these alerts are false positives. Identifying the true-positive alerts and repairing the defects in the associated code are huge efforts that automated program repair (APR) tools can help with. Our experience showed us that APR can reduce the number of SA alerts significantly and reduce the manual effort of analysts to review code. This engineering experience paper details the application of design, development, and performance testing to an APR tool we built that repairs C/C++ code associated with 3 categories of alerts produced by multiple SA tools. Its repairs are simple and local. Furthermore, our findings convinced the maintainers of the CERT Coding Standards to re-assess and update the metrics used to assess when violations of guidelines are detectable or repairable. We discuss engineering design choices made to support goals of trustworthiness and acceptability to developers. Our APR tool repaired 8718 out of 9234 alerts produced by one SA tool on one codebase. It can repair 3 flaw categories. For 2 flaw categories, 2 SA tools, and 2 codebases, our tool repaired or dismissed as false positives over 80% of alerts, on average. Tests showed repairs did not appreciably degrade the performance of the code or cause new alerts to appear (with the possible exception of sqlite3.c). This paper describes unique contributions that include a new empirical analysis of SA data, our selection method for flaw categories to repair, publication of our APR tool, and a dataset of SA alerts from open-source SA tools run on open-source codebases. It discusses positive and negative results and lessons learned.2025-08-04T18:44:50ZDavid SvobodaLori FlynnWilliam KlieberMichael DugganNicholas ReimerJoseph Sible