https://arxiv.org/api/PIMcN0yRGJRU7pufiatz0KQYJI8 2026-06-21T15:57:12Z 9918 750 15 http://arxiv.org/abs/2601.12385v1 Context-Free Grammar Inference for Complex Programming Languages in Black Box Settings 2026-01-18T12:45:54Z

Grammar inference for complex programming languages remains a significant challenge, as existing approaches fail to scale to real world datasets within practical time constraints. In our experiments, none of the state-of-the-art tools, including Arvada, Treevada and Kedavra were able to infer grammars for complex languages such as C, C++, and Java within 48 hours. Arvada and Treevada perform grammar inference directly on full-length input examples, which proves inefficient for large files commonly found in such languages. While Kedavra introduces data decomposition to create shorter examples for grammar inference, its lexical analysis still relies on the original inputs. Additionally, its strict no-overgeneralization constraint limits the construction of complex grammars. To overcome these limitations, we propose Crucio, which builds a decomposition forest to extract short examples for lexical and grammar inference via a distributional matrix. Experimental results show that Crucio is the only method capable of successfully inferring grammars for complex programming languages (where the number of nonterminals is up to 23x greater than in prior benchmarks) within reasonable time limits. On the prior simple benchmark, Crucio achieves an average recall improvement of 1.37x and 1.19x over Treevada and Kedavra, respectively, and improves F1 scores by 1.21x and 1.13x.

2026-01-18T12:45:54Z Feifei Li Xiao Chen Xiaoyu Sun Xi Xiao Shaohua Wang Yong Ding Sheng Wen Qing Li http://arxiv.org/abs/2601.12270v1 SplittingSecrets: A Compiler-Based Defense for Preventing Data Memory-Dependent Prefetcher Side-Channels 2026-01-18T05:55:46Z

Traditional side-channels take advantage of secrets being used as inputs to unsafe instructions, used for memory accesses, or used in control flow decisions. Constant-time programming, which restricts such code patterns, has been widely adopted as a defense against these vulnerabilities. However, new hardware optimizations in the form of Data Memory-dependent Prefetchers (DMP) present in Apple, Intel, and ARM CPUs have shown such defenses are not sufficient. These prefetchers, unlike classical prefetchers, use the content of memory as well as the trace of prior accesses to determine prefetch targets. An adversary abusing such a prefetcher has been shown to be able to mount attacks leaking data-at-rest; data that is never used by the program, even speculatively, in an unsafe manner. In response, this paper introduces SplittingSecrets, a compiler-based tool that can harden software libraries against side-channels arising from DMPs. SplittingSecrets's approach avoids reasoning about the complex internals of different DMPs and instead relies on one key aspect of all DMPs: activation requires data to resemble addresses. To prevent secret data from leaking, SplittingSecrets transforms memory operations to ensure that secrets are never stored in memory in a manner resembling an address, thereby avoiding DMP activation on those secrets. Rather than disable a DMP entirely, SplittingSecrets can provide targeted hardening for only specific secrets entirely in software. We have implemented SplittingSecrets using LLVM, supporting both source-level memory operations and those generated by the compiler backend for the AArch64 architecture, We have analyzed the performance overhead involved in safeguarding secrets from DMP-induced attacks using common primitives in libsodium, a popular cryptographic library when built for Apple M-series CPUs.

2026-01-18T05:55:46Z Reshabh K Sharma Dan Grossman David Kohlbrenner http://arxiv.org/abs/2510.17429v3 Introducing Linear Implication Types to $λ_{GT}$ for Computing With Incomplete Graphs 2026-01-17T01:00:05Z

Designing programming languages that enable intuitive and safe manipulation of data structures is a critical research challenge. Conventional destructive memory operations using pointers are complex and prone to errors. Existing type systems, such as affine types and shape types, address this problem towards safe manipulation of heaps and pointers, but design of high-level declarative languages that allow us to manipulate complex pointer data structures at a higher level of abstraction is largely an open problem. The $λ_{GT}$ language, a purely functional programming language that treats hypergraphs (hereafter referred to as graphs) as primary data structures, addresses some of these challenges. By abstracting data with shared references and cycles as graphs, it enables declarative operations through pattern matching and leverages its type system to guarantee safety of these operations. Nevertheless, the previously proposed type system of $λ_{GT}$ leaves two significant open challenges. First, the type system does not support \emph{incomplete graphs}, that is, graphs in which some elements are missing from the graphs of user-defined types. Second, the type system relies on dynamic type checking during pattern matching. This study addresses these two challenges by incorporating linear implication into the $λ_{GT}$ type system, while introducing new constraints to ensure its soundness.

2025-10-20T11:15:41Z 36 pages, 14 figures, This paper was accepted at PRO2025-3 and is scheduled to appear in the IPSJ Journal Jin Sano Naoki Yamamoto Kazunori Ueda http://arxiv.org/abs/2601.19941v1 Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation 2026-01-16T20:52:42Z

In last two years, large language models (LLMs) have shown strong capabilities in code generation, including hardware design at register-transfer level (RTL). While their use in high-level synthesis (HLS) remains comparatively less mature, the ratio of HLS- to RTL-focused studies has shifted from 1:10 to 2:10 in the past six months, indicating growing interest in leveraging LLMs for high-level design entry while relying on downstream synthesis for optimization. This growing trend highlights the need for a comprehensive benchmarking and evaluation framework dedicated to LLM-based HLS. To address this, We present Bench4HLS for evaluating LLM-generated HLS designs. Bench4HLS comprises 170 manually drafted and validated case studies, spanning small kernels to complex accelerators, curated from widely used public repositories. The framework supports fully automated assessment of compilation success, functional correctness via simulation, and synthesis feasibility/optimization. Crucially, Bench4HLS integrates a pluggable API for power, performance, and area (PPA) analysis across various HLS toolchains and architectures, demonstrated here with Xilinx Vitis HLS and validated on Catapult HLS. By providing a structured, extensible, and plug-and-play testbed, Bench4HLS establishes a foundational methodology for benchmarking LLMs in HLS workflows.

2026-01-16T20:52:42Z Accepted to the Design, Automation and Test in Europe Conference (DATE 2026) M Zafir Sadik Khan Kimia Azar Hadi Kamali http://arxiv.org/abs/2601.11408v1 Qihe: A General-Purpose Static Analysis Framework for Verilog 2026-01-16T16:26:36Z

In the past decades, static analysis has thrived in software, facilitating applications in bug detection, security, and program understanding. These advanced analyses are largely underpinned by general-purpose static analysis frameworks, which offer essential infrastructure to streamline their development. Conversely, hardware lacks such a framework, which overshadows the promising opportunities for sophisticated static analysis in hardware, hindering achievements akin to those witnessed in software. We thus introduce Qihe, the first general-purpose static analysis framework for Verilog -- a highly challenging endeavor given the absence of precedents in hardware. Qihe features an analysis-oriented front end, a Verilog-specific IR, and a suite of diverse fundamental analyses that capture essential hardware-specific characteristics -- such as bit-vector arithmetic, register synchronization, and digital component concurrency -- and enable the examination of intricate hardware data and control flows. These fundamental analyses are designed to support a wide array of hardware analysis clients. To validate Qihe's utility, we further developed a set of clients spanning bug detection, security, and program understanding. Our preliminary experimental results are highly promising; for example, Qihe uncovered 9 previously unknown bugs in popular real-world hardware projects (averaging 1.5K+ GitHub stars), all of which were confirmed by developers; moreover, Qihe successfully identified 18 bugs beyond the capabilities of existing static analyses for Verilog bug detection (i.e., linters), and detected 16 vulnerabilities in real-world hardware programs. By open-sourcing Qihe, which comprises over 100K lines of code, we aim to inspire further innovation and applications of sophisticated static analysis for hardware, aspiring to foster a similarly vibrant ecosystem that software analysis enjoys.

2026-01-16T16:26:36Z Qinlin Chen Nairen Zhang Jinpeng Wang Jiacai Cui Tian Tan Xiaoxing Ma Chang Xu Jian Lu Yue Li http://arxiv.org/abs/2601.11358v1 Cutting Corners on Uncertainty: Zonotope Abstractions for Stream-based Runtime Monitoring 2026-01-16T15:14:04Z

Stream-based monitoring assesses the health of safety-critical systems by transforming input streams of sensor measurements into output streams that determine a verdict. These inputs are often treated as accurate representations of the physical state, although real sensors introduce calibration and measurement errors. Such errors propagate through the monitor's computations and can distort the final verdict. Affine arithmetic with symbolic slack variables can track these errors precisely, but independent measurement noise introduces a fresh slack variable upon each measurement event, causing the monitor's state representation to grow without bound over time. Therefore, any bounded-memory monitoring algorithm must unify slack variables at runtime in a way that generates a sound approximation. This paper introduces zonotopes as an abstract domain for online monitoring of RLola specifications. We demonstrate that zonotopes precisely capture the affine state of the monitor and that their over-approximation produces a sound bounded-memory monitor. We present a comparison of different zonotope over-approximation strategies in the context of runtime monitoring, evaluating their performance and false-positive rates.

2026-01-16T15:14:04Z Bernd Finkbeiner Martin Fränzle Florian Kohn Paul Kröger http://arxiv.org/abs/2602.06976v1 Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks 2026-01-16T09:06:47Z

The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.

2026-01-16T09:06:47Z Chen Shen Wei Cheng Jingyue Yang Huan Zhang Yuhan Wu Wei Hu http://arxiv.org/abs/2501.13633v5 Representing Molecules with Algebraic Data Types: Beyond SMILES and SELFIES 2026-01-16T00:46:05Z

Benchmarks of molecular machine learning models often treat the molecular representation as a neutral input format, yet the representation defines the syntax of validity, edit operations, and invariances that models implicitly learn. We propose MolADT, a typed intermediate representation (IR) for molecules expressed as a family of algebraic data types that separates (i) constitution via Dietz-style bonding systems, (ii) 3D geometry and stereochemistry, and (iii) optional electronic annotations. By shifting from string edits to operations over structured values, MolADT makes representational assumptions explicit, supports deterministic validation and localized transformations, and provides hooks for symmetry-aware and Bayesian workflows. We provide a reference implementation in Haskell (open-source, archived with DOI) and worked examples demonstrating delocalised/multicentre bonding, validation invariants, reaction extensions, and group actions relevant to geometric learning.

2025-01-23T13:06:35Z 3 Figures Oliver Goldstein Samuel March http://arxiv.org/abs/2509.13699v2 Multi-Threaded Software Model Checking via Parallel Trace Abstraction Refinement 2026-01-15T07:46:21Z

Automatic software verification is a valuable means for software quality assurance. However, automatic verification and in particular software model checking can be time-consuming, which hinders their practical applicability e.g., the use in continuous integration. One solution to address the issue is to reduce the response time of the verification procedure by leveraging today's multi-core CPUs. In this paper, we propose a solution to parallelize trace abstraction, an abstraction-based approach to software model checking. The underlying idea of our approach is to parallelize the abstraction refinement. More concretely, our approach analyzes different traces (syntactic program paths) that could violate the safety property in parallel. We realize our parallelized version of trace abstraction in the verification tool Ulti mate Automizer and perform a thorough evaluation. Our evaluation shows that our parallelization is more effective than sequential trace abstraction and can provide results significantly faster on many time-consuming tasks. Also, our approach is more effective than DSS, a recent parallel approach to abstraction-based software model checking.

2025-09-17T05:05:16Z Max Barth Marie-Christine Jakobs http://arxiv.org/abs/2601.09839v1 Lazy Evaluation: A Comparative Analysis of SAS MACROs and R Functions 2026-01-14T19:55:29Z

Lazy evaluation is a powerful technique that can optimize code execution by deferring evaluations until their results are required, thus enhancing efficiency. In most modern programming languages, like R, lazy evaluation is commonly applied to function arguments. However, the application of lazy evaluation in SAS has not been extensively explored. This paper focuses on the mechanisms of lazy evaluation in SAS MACROs and R functions, offering a comparative analysis of the underlying principles that drive these processes. R's lazy evaluation is driven by a data structure called Promise, which postpones evaluation and does not occupy memory until the value is needed, utilizing a call-by-need strategy. SAS, on the other hand, achieves lazy evaluation through its symbol tables, employing memory to store parameters, and operates on a call-by-name basis. These discrepancies in lazy evaluation strategies can notably impact the results of R functions and SAS MACROs. By examining these distinct approaches, the paper illuminates the impact of lazy evaluation on programming efficiency, supported by illustrative examples. As the shift from SAS to R becomes increasingly prevalent in the pharmaceutical industry, understanding these techniques enables programmers to optimize their code for greater efficacy. This exploration serves as a guide to enhance programming capabilities and performance in both languages.

2026-01-14T19:55:29Z This paper was originally published in SESUG 2025 Conference Proceedings. Cary, NC: SouthEast SAS Users Group Chen Ling Yachen Wang http://arxiv.org/abs/2601.09808v1 From Dynamic to Lexical: A Comparative Exploration of Scoping Rules in SAS and R 2026-01-14T19:15:07Z

Variable scoping dictates how and where variables are accessible within programming languages, playing a crucial role in code efficiency and organization. This paper examines the distinct scoping rules in SAS and R, focusing on SAS's dynamic scoping and R's lexical scoping. In SAS, dynamic scoping utilizes symbol tables, resolving variables at runtime by dynamically searching through active macro layers. R, in contrast, employs lexical scoping, using environments to resolve variables based on the structure in which functions are defined. Illustrative examples highlight the differences between these scoping strategies, showcasing their impact on code behavior. Additionally, the paper outlines methods for inspecting variables in SAS's symbol tables and R's environments, offering practical insights for debugging and optimization. Strategies for controlling variable scope in both languages are discussed, enhancing code precision and reliability. This exploration equips programmers with critical understanding to optimize variable management, improving their programming practices in SAS and R.

2026-01-14T19:15:07Z This paper was originally published in the SESUG 2025 Conference Proceedings. Cary, NC Chen Ling Yachen Wang http://arxiv.org/abs/2601.09583v1 MLIR-Forge: A Modular Framework for Language Smiths 2026-01-14T15:52:15Z

Optimizing compilers are essential for the efficient and correct execution of software across various scientific fields. Domain-specific languages (DSL) typically use higher level intermediate representations (IR) in their compiler pipelines for domain-specific optimizations. As these IRs add to complexity, it is crucial to test them thoroughly. Random program generators have proven to be an effective tool to test compilers through differential and fuzz testing. However, developing specialized program generators for compiler IRs is not straightforward and demands considerable resources. We introduce MLIR-Forge, a novel random program generator framework that leverages the flexibility of MLIR, aiming to simplify the creation of specialized program generators. MLIR-Forge achieves this by splitting the generation process into fundamental building blocks that are language specific, and reusable program creation logic that constructs random programs from these building blocks. This hides complexity and furthermore, even the language specific components can be defined using a set of common tools. We demonstrate MLIR-Forge's capabilities by generating MLIR with built-in dialects, WebAssembly, and a data-centric program representation, DaCe -- requiring less than a week of development time in total for each of them. Using the generated programs we conduct differential testing and find 9 MLIR, 15 WebAssembly, and 774 DaCe groups of bugs with the corresponding program generators, after running them until the rate of new bugs stagnates.

2026-01-14T15:52:15Z Berke Ates Philipp Schaad Timo Schneider Alexandru Calotoiu Torsten Hoefler http://arxiv.org/abs/2601.04085v2 CSSG: Measuring Code Similarity with Semantic Graphs 2026-01-14T13:20:24Z

Existing code similarity metrics, such as BLEU, CodeBLEU, and TSED, largely rely on surface-level string overlap or abstract syntax tree structures, and often fail to capture deeper semantic relationships between programs.We propose CSSG (Code Similarity using Semantic Graphs), a novel metric that leverages program dependence graphs to explicitly model control dependencies and variable interactions, providing a semantics-aware representation of code.Experiments on the CodeContests+ dataset show that CSSG consistently outperforms existing metrics in distinguishing more similar code from less similar code under both monolingual and cross-lingual settings, demonstrating that dependency-aware graph representations offer a more effective alternative to surface-level or syntax-based similarity measures.

2026-01-07T16:54:02Z Yiyang Lu Jingwen Xu Changze Lv Zisu Huang Zhengkang Guo Zhengyuan Wang Muzhao Tian Xuanjing Huang Xiaoqing Zheng http://arxiv.org/abs/2304.08848v2 Forward Symbolic Execution for Trustworthy Automation of Binary Code Verification 2026-01-13T20:56:31Z

Control flow in unstructured programs can be complex and dynamic, which makes static analysis difficult. Yet, automated reasoning about unstructured control flow is important when certifying properties of binary (machine) code in trustworthy systems, e.g., cryptographic routines. We present a theory of forward symbolic execution for unstructured programs suitable for use in theorem provers that enables automated verification of both functional and non-functional program properties. The theory's foundation is a set of inference rules where each member corresponds to an operation in a symbolic execution engine. The rules are designed to give control over the tradeoff between the preservation of precision and introduction of overapproximation. We instantiate our theory for BIR, a previously proposed intermediate language for binary analysis. We demonstrate how symbolic executors can be constructed for BIR with common optimizations such as pruning of infeasible symbolic states. We implemented our theory in the HOL4 theorem prover using the HolBA binary analysis library, obtaining machine-checked proofs of soundness of symbolic execution for BIR. We practically evaluated two applications of our theory: verification of functional properties of RISC-V binaries and verification of execution time bounds of programs running on the ARM Cortex-M0 processor. The evaluation shows that such verification can be automated with moderate overhead on medium-sized programs.

2023-04-18T09:31:10Z Andreas Lindner Karl Palmskog Scott Constable Mads Dam Roberto Guanciale Hamed Nemati 10.1007/978-3-032-15700-3_8 http://arxiv.org/abs/2601.08529v1 Formalization and Implementation of Safe Destination Passing in Pure Functional Programming Settings 2026-01-13T13:13:38Z

Destination-passing style programming introduces destinations, which represent the address of a write-once memory cell. These destinations can be passed as function parameters, allowing the caller to control memory management: the callee simply fills the cell instead of allocating space for a return value. While typically used in systems programming, destination passing also has applications in pure functional programming, where it enables programs that were previously unexpressible using usual immutable data structures. In this thesis, we develop a core λ-calculus with destinations, {λ_d}. Our new calculus is more expressive than similar existing systems, with destination passing designed to be as flexible as possible. This is achieved through a modal type system combining linear types with a system of ages to manage scopes, in order to make destination-passing safe. Type safety of our core calculus was proved formally with the Coq proof assistant. Then, we see how this core calculus can be adapted into an existing pure functional language, Haskell, whose type system is less powerful than our custom theoretical one. Retaining safety comes at the cost of removing some flexibility in the handling of destinations. We later refine the implementation to recover much of this flexibility, at the cost of increased user complexity. The prototype implementation in Haskell shows encouraging results for adopting destination-passing style programming when traversing or mapping over large data structures such as lists or data trees.

2026-01-13T13:13:38Z PhD Manuscript, 148 pages. Sources: https://github.com/tweag/tbagrel-phd-manuscript/ Computer Science [cs]. LORIA (Université de Lorraine, CNRS, INRIA), 2025. English. <tel-05455981> Thomas Bagrel