https://arxiv.org/api/PIMcN0yRGJRU7pufiatz0KQYJI82026-06-21T15:57:12Z991875015http://arxiv.org/abs/2601.12385v1Context-Free Grammar Inference for Complex Programming Languages in Black Box Settings2026-01-18T12:45:54ZGrammar inference for complex programming languages remains a significant challenge, as existing approaches fail to scale to real world datasets within practical time constraints. In our experiments, none of the state-of-the-art tools, including Arvada, Treevada and Kedavra were able to infer grammars for complex languages such as C, C++, and Java within 48 hours. Arvada and Treevada perform grammar inference directly on full-length input examples, which proves inefficient for large files commonly found in such languages. While Kedavra introduces data decomposition to create shorter examples for grammar inference, its lexical analysis still relies on the original inputs. Additionally, its strict no-overgeneralization constraint limits the construction of complex grammars.
To overcome these limitations, we propose Crucio, which builds a decomposition forest to extract short examples for lexical and grammar inference via a distributional matrix. Experimental results show that Crucio is the only method capable of successfully inferring grammars for complex programming languages (where the number of nonterminals is up to 23x greater than in prior benchmarks) within reasonable time limits. On the prior simple benchmark, Crucio achieves an average recall improvement of 1.37x and 1.19x over Treevada and Kedavra, respectively, and improves F1 scores by 1.21x and 1.13x.2026-01-18T12:45:54ZFeifei LiXiao ChenXiaoyu SunXi XiaoShaohua WangYong DingSheng WenQing Lihttp://arxiv.org/abs/2601.12270v1SplittingSecrets: A Compiler-Based Defense for Preventing Data Memory-Dependent Prefetcher Side-Channels2026-01-18T05:55:46ZTraditional side-channels take advantage of secrets being used as inputs to unsafe instructions, used for memory accesses, or used in control flow decisions. Constant-time programming, which restricts such code patterns, has been widely adopted as a defense against these vulnerabilities. However, new hardware optimizations in the form of Data Memory-dependent Prefetchers (DMP) present in Apple, Intel, and ARM CPUs have shown such defenses are not sufficient. These prefetchers, unlike classical prefetchers, use the content of memory as well as the trace of prior accesses to determine prefetch targets. An adversary abusing such a prefetcher has been shown to be able to mount attacks leaking data-at-rest; data that is never used by the program, even speculatively, in an unsafe manner.
In response, this paper introduces SplittingSecrets, a compiler-based tool that can harden software libraries against side-channels arising from DMPs. SplittingSecrets's approach avoids reasoning about the complex internals of different DMPs and instead relies on one key aspect of all DMPs: activation requires data to resemble addresses. To prevent secret data from leaking, SplittingSecrets transforms memory operations to ensure that secrets are never stored in memory in a manner resembling an address, thereby avoiding DMP activation on those secrets. Rather than disable a DMP entirely, SplittingSecrets can provide targeted hardening for only specific secrets entirely in software.
We have implemented SplittingSecrets using LLVM, supporting both source-level memory operations and those generated by the compiler backend for the AArch64 architecture, We have analyzed the performance overhead involved in safeguarding secrets from DMP-induced attacks using common primitives in libsodium, a popular cryptographic library when built for Apple M-series CPUs.2026-01-18T05:55:46ZReshabh K SharmaDan GrossmanDavid Kohlbrennerhttp://arxiv.org/abs/2510.17429v3Introducing Linear Implication Types to $λ_{GT}$ for Computing With Incomplete Graphs2026-01-17T01:00:05ZDesigning programming languages that enable intuitive and safe manipulation of data structures is a critical research challenge. Conventional destructive memory operations using pointers are complex and prone to errors. Existing type systems, such as affine types and shape types, address this problem towards safe manipulation of heaps and pointers, but design of high-level declarative languages that allow us to manipulate complex pointer data structures at a higher level of abstraction is largely an open problem. The $λ_{GT}$ language, a purely functional programming language that treats hypergraphs (hereafter referred to as graphs) as primary data structures, addresses some of these challenges. By abstracting data with shared references and cycles as graphs, it enables declarative operations through pattern matching and leverages its type system to guarantee safety of these operations. Nevertheless, the previously proposed type system of $λ_{GT}$ leaves two significant open challenges. First, the type system does not support \emph{incomplete graphs}, that is, graphs in which some elements are missing from the graphs of user-defined types. Second, the type system relies on dynamic type checking during pattern matching. This study addresses these two challenges by incorporating linear implication into the $λ_{GT}$ type system, while introducing new constraints to ensure its soundness.2025-10-20T11:15:41Z36 pages, 14 figures, This paper was accepted at PRO2025-3 and is scheduled to appear in the IPSJ JournalJin SanoNaoki YamamotoKazunori Uedahttp://arxiv.org/abs/2601.19941v1Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation2026-01-16T20:52:42ZIn last two years, large language models (LLMs) have shown strong capabilities in code generation, including hardware design at register-transfer level (RTL). While their use in high-level synthesis (HLS) remains comparatively less mature, the ratio of HLS- to RTL-focused studies has shifted from 1:10 to 2:10 in the past six months, indicating growing interest in leveraging LLMs for high-level design entry while relying on downstream synthesis for optimization. This growing trend highlights the need for a comprehensive benchmarking and evaluation framework dedicated to LLM-based HLS. To address this, We present Bench4HLS for evaluating LLM-generated HLS designs. Bench4HLS comprises 170 manually drafted and validated case studies, spanning small kernels to complex accelerators, curated from widely used public repositories. The framework supports fully automated assessment of compilation success, functional correctness via simulation, and synthesis feasibility/optimization. Crucially, Bench4HLS integrates a pluggable API for power, performance, and area (PPA) analysis across various HLS toolchains and architectures, demonstrated here with Xilinx Vitis HLS and validated on Catapult HLS. By providing a structured, extensible, and plug-and-play testbed, Bench4HLS establishes a foundational methodology for benchmarking LLMs in HLS workflows.2026-01-16T20:52:42ZAccepted to the Design, Automation and Test in Europe Conference (DATE 2026)M Zafir Sadik KhanKimia AzarHadi Kamalihttp://arxiv.org/abs/2601.11408v1Qihe: A General-Purpose Static Analysis Framework for Verilog2026-01-16T16:26:36ZIn the past decades, static analysis has thrived in software, facilitating applications in bug detection, security, and program understanding. These advanced analyses are largely underpinned by general-purpose static analysis frameworks, which offer essential infrastructure to streamline their development. Conversely, hardware lacks such a framework, which overshadows the promising opportunities for sophisticated static analysis in hardware, hindering achievements akin to those witnessed in software. We thus introduce Qihe, the first general-purpose static analysis framework for Verilog -- a highly challenging endeavor given the absence of precedents in hardware. Qihe features an analysis-oriented front end, a Verilog-specific IR, and a suite of diverse fundamental analyses that capture essential hardware-specific characteristics -- such as bit-vector arithmetic, register synchronization, and digital component concurrency -- and enable the examination of intricate hardware data and control flows. These fundamental analyses are designed to support a wide array of hardware analysis clients. To validate Qihe's utility, we further developed a set of clients spanning bug detection, security, and program understanding. Our preliminary experimental results are highly promising; for example, Qihe uncovered 9 previously unknown bugs in popular real-world hardware projects (averaging 1.5K+ GitHub stars), all of which were confirmed by developers; moreover, Qihe successfully identified 18 bugs beyond the capabilities of existing static analyses for Verilog bug detection (i.e., linters), and detected 16 vulnerabilities in real-world hardware programs. By open-sourcing Qihe, which comprises over 100K lines of code, we aim to inspire further innovation and applications of sophisticated static analysis for hardware, aspiring to foster a similarly vibrant ecosystem that software analysis enjoys.2026-01-16T16:26:36ZQinlin ChenNairen ZhangJinpeng WangJiacai CuiTian TanXiaoxing MaChang XuJian LuYue Lihttp://arxiv.org/abs/2601.11358v1Cutting Corners on Uncertainty: Zonotope Abstractions for Stream-based Runtime Monitoring2026-01-16T15:14:04ZStream-based monitoring assesses the health of safety-critical systems by transforming input streams of sensor measurements into output streams that determine a verdict. These inputs are often treated as accurate representations of the physical state, although real sensors introduce calibration and measurement errors. Such errors propagate through the monitor's computations and can distort the final verdict. Affine arithmetic with symbolic slack variables can track these errors precisely, but independent measurement noise introduces a fresh slack variable upon each measurement event, causing the monitor's state representation to grow without bound over time. Therefore, any bounded-memory monitoring algorithm must unify slack variables at runtime in a way that generates a sound approximation.
This paper introduces zonotopes as an abstract domain for online monitoring of RLola specifications. We demonstrate that zonotopes precisely capture the affine state of the monitor and that their over-approximation produces a sound bounded-memory monitor. We present a comparison of different zonotope over-approximation strategies in the context of runtime monitoring, evaluating their performance and false-positive rates.2026-01-16T15:14:04ZBernd FinkbeinerMartin FränzleFlorian KohnPaul Krögerhttp://arxiv.org/abs/2602.06976v1Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks2026-01-16T09:06:47ZThe proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.2026-01-16T09:06:47ZChen ShenWei ChengJingyue YangHuan ZhangYuhan WuWei Huhttp://arxiv.org/abs/2501.13633v5Representing Molecules with Algebraic Data Types: Beyond SMILES and SELFIES2026-01-16T00:46:05ZBenchmarks of molecular machine learning models often treat the molecular representation as a neutral input format, yet the representation defines the syntax of validity, edit operations, and invariances that models implicitly learn. We propose MolADT, a typed intermediate representation (IR) for molecules expressed as a family of algebraic data types that separates (i) constitution via Dietz-style bonding systems, (ii) 3D geometry and stereochemistry, and (iii) optional electronic annotations. By shifting from string edits to operations over structured values, MolADT makes representational assumptions explicit, supports deterministic validation and localized transformations, and provides hooks for symmetry-aware and Bayesian workflows. We provide a reference implementation in Haskell (open-source, archived with DOI) and worked examples demonstrating delocalised/multicentre bonding, validation invariants, reaction extensions, and group actions relevant to geometric learning.2025-01-23T13:06:35Z3 FiguresOliver GoldsteinSamuel Marchhttp://arxiv.org/abs/2509.13699v2Multi-Threaded Software Model Checking via Parallel Trace Abstraction Refinement2026-01-15T07:46:21ZAutomatic software verification is a valuable means for software quality assurance. However, automatic verification and in particular software model checking can be time-consuming, which hinders their practical applicability e.g., the use in continuous integration. One solution to address the issue is to reduce the response time of the verification procedure by leveraging today's multi-core CPUs.
In this paper, we propose a solution to parallelize trace abstraction, an abstraction-based approach to software model checking. The underlying idea of our approach is to parallelize the abstraction refinement. More concretely, our approach analyzes different traces (syntactic program paths) that could violate the safety property in parallel. We realize our parallelized version of trace abstraction in the verification tool Ulti mate Automizer and perform a thorough evaluation. Our evaluation shows that our parallelization is more effective than sequential trace abstraction and can provide results significantly faster on many time-consuming tasks. Also, our approach is more effective than DSS, a recent parallel approach to abstraction-based software model checking.2025-09-17T05:05:16ZMax BarthMarie-Christine Jakobshttp://arxiv.org/abs/2601.09839v1Lazy Evaluation: A Comparative Analysis of SAS MACROs and R Functions2026-01-14T19:55:29ZLazy evaluation is a powerful technique that can optimize code execution by deferring evaluations until their results are required, thus enhancing efficiency. In most modern programming languages, like R, lazy evaluation is commonly applied to function arguments. However, the application of lazy evaluation in SAS has not been extensively explored. This paper focuses on the mechanisms of lazy evaluation in SAS MACROs and R functions, offering a comparative analysis of the underlying principles that drive these processes.
R's lazy evaluation is driven by a data structure called Promise, which postpones evaluation and does not occupy memory until the value is needed, utilizing a call-by-need strategy. SAS, on the other hand, achieves lazy evaluation through its symbol tables, employing memory to store parameters, and operates on a call-by-name basis. These discrepancies in lazy evaluation strategies can notably impact the results of R functions and SAS MACROs. By examining these distinct approaches, the paper illuminates the impact of lazy evaluation on programming efficiency, supported by illustrative examples. As the shift from SAS to R becomes increasingly prevalent in the pharmaceutical industry, understanding these techniques enables programmers to optimize their code for greater efficacy. This exploration serves as a guide to enhance programming capabilities and performance in both languages.2026-01-14T19:55:29ZThis paper was originally published in SESUG 2025 Conference Proceedings. Cary, NC: SouthEast SAS Users GroupChen LingYachen Wanghttp://arxiv.org/abs/2601.09808v1From Dynamic to Lexical: A Comparative Exploration of Scoping Rules in SAS and R2026-01-14T19:15:07ZVariable scoping dictates how and where variables are accessible within programming languages, playing a crucial role in code efficiency and organization. This paper examines the distinct scoping rules in SAS and R, focusing on SAS's dynamic scoping and R's lexical scoping. In SAS, dynamic scoping utilizes symbol tables, resolving variables at runtime by dynamically searching through active macro layers. R, in contrast, employs lexical scoping, using environments to resolve variables based on the structure in which functions are defined. Illustrative examples highlight the differences between these scoping strategies, showcasing their impact on code behavior. Additionally, the paper outlines methods for inspecting variables in SAS's symbol tables and R's environments, offering practical insights for debugging and optimization. Strategies for controlling variable scope in both languages are discussed, enhancing code precision and reliability. This exploration equips programmers with critical understanding to optimize variable management, improving their programming practices in SAS and R.2026-01-14T19:15:07ZThis paper was originally published in the SESUG 2025 Conference Proceedings. Cary, NCChen LingYachen Wanghttp://arxiv.org/abs/2601.09583v1MLIR-Forge: A Modular Framework for Language Smiths2026-01-14T15:52:15ZOptimizing compilers are essential for the efficient and correct execution of software across various scientific fields. Domain-specific languages (DSL) typically use higher level intermediate representations (IR) in their compiler pipelines for domain-specific optimizations. As these IRs add to complexity, it is crucial to test them thoroughly. Random program generators have proven to be an effective tool to test compilers through differential and fuzz testing. However, developing specialized program generators for compiler IRs is not straightforward and demands considerable resources. We introduce MLIR-Forge, a novel random program generator framework that leverages the flexibility of MLIR, aiming to simplify the creation of specialized program generators. MLIR-Forge achieves this by splitting the generation process into fundamental building blocks that are language specific, and reusable program creation logic that constructs random programs from these building blocks. This hides complexity and furthermore, even the language specific components can be defined using a set of common tools. We demonstrate MLIR-Forge's capabilities by generating MLIR with built-in dialects, WebAssembly, and a data-centric program representation, DaCe -- requiring less than a week of development time in total for each of them. Using the generated programs we conduct differential testing and find 9 MLIR, 15 WebAssembly, and 774 DaCe groups of bugs with the corresponding program generators, after running them until the rate of new bugs stagnates.2026-01-14T15:52:15ZBerke AtesPhilipp SchaadTimo SchneiderAlexandru CalotoiuTorsten Hoeflerhttp://arxiv.org/abs/2601.04085v2CSSG: Measuring Code Similarity with Semantic Graphs2026-01-14T13:20:24ZExisting code similarity metrics, such as BLEU, CodeBLEU, and TSED, largely rely on surface-level string overlap or abstract syntax tree structures, and often fail to capture deeper semantic relationships between programs.We propose CSSG (Code Similarity using Semantic Graphs), a novel metric that leverages program dependence graphs to explicitly model control dependencies and variable interactions, providing a semantics-aware representation of code.Experiments on the CodeContests+ dataset show that CSSG consistently outperforms existing metrics in distinguishing more similar code from less similar code under both monolingual and cross-lingual settings, demonstrating that dependency-aware graph representations offer a more effective alternative to surface-level or syntax-based similarity measures.2026-01-07T16:54:02ZYiyang LuJingwen XuChangze LvZisu HuangZhengkang GuoZhengyuan WangMuzhao TianXuanjing HuangXiaoqing Zhenghttp://arxiv.org/abs/2304.08848v2Forward Symbolic Execution for Trustworthy Automation of Binary Code Verification2026-01-13T20:56:31ZControl flow in unstructured programs can be complex and dynamic, which makes static analysis difficult. Yet, automated reasoning about unstructured control flow is important when certifying properties of binary (machine) code in trustworthy systems, e.g., cryptographic routines. We present a theory of forward symbolic execution for unstructured programs suitable for use in theorem provers that enables automated verification of both functional and non-functional program properties. The theory's foundation is a set of inference rules where each member corresponds to an operation in a symbolic execution engine. The rules are designed to give control over the tradeoff between the preservation of precision and introduction of overapproximation. We instantiate our theory for BIR, a previously proposed intermediate language for binary analysis. We demonstrate how symbolic executors can be constructed for BIR with common optimizations such as pruning of infeasible symbolic states. We implemented our theory in the HOL4 theorem prover using the HolBA binary analysis library, obtaining machine-checked proofs of soundness of symbolic execution for BIR. We practically evaluated two applications of our theory: verification of functional properties of RISC-V binaries and verification of execution time bounds of programs running on the ARM Cortex-M0 processor. The evaluation shows that such verification can be automated with moderate overhead on medium-sized programs.2023-04-18T09:31:10ZAndreas LindnerKarl PalmskogScott ConstableMads DamRoberto GuancialeHamed Nemati10.1007/978-3-032-15700-3_8http://arxiv.org/abs/2601.08529v1Formalization and Implementation of Safe Destination Passing in Pure Functional Programming Settings2026-01-13T13:13:38ZDestination-passing style programming introduces destinations, which represent the address of a write-once memory cell. These destinations can be passed as function parameters, allowing the caller to control memory management: the callee simply fills the cell instead of allocating space for a return value. While typically used in systems programming, destination passing also has applications in pure functional programming, where it enables programs that were previously unexpressible using usual immutable data structures.
In this thesis, we develop a core λ-calculus with destinations, {λ_d}. Our new calculus is more expressive than similar existing systems, with destination passing designed to be as flexible as possible. This is achieved through a modal type system combining linear types with a system of ages to manage scopes, in order to make destination-passing safe. Type safety of our core calculus was proved formally with the Coq proof assistant.
Then, we see how this core calculus can be adapted into an existing pure functional language, Haskell, whose type system is less powerful than our custom theoretical one. Retaining safety comes at the cost of removing some flexibility in the handling of destinations. We later refine the implementation to recover much of this flexibility, at the cost of increased user complexity.
The prototype implementation in Haskell shows encouraging results for adopting destination-passing style programming when traversing or mapping over large data structures such as lists or data trees.2026-01-13T13:13:38ZPhD Manuscript, 148 pages. Sources: https://github.com/tweag/tbagrel-phd-manuscript/Computer Science [cs]. LORIA (Université de Lorraine, CNRS, INRIA), 2025. English. <tel-05455981>Thomas Bagrel