https://arxiv.org/api/2nx31u+/W9jwtmpAf69tGIOHFZI2026-06-28T22:12:37Z9951169515http://arxiv.org/abs/2503.15556v2Fully Automated Generation of Combinatorial Optimisation Systems Using Large Language Models2025-04-04T17:13:59ZOver the last few decades, researchers have made considerable efforts to make decision support more accessible for small and medium enterprises by reducing the cost of designing, developing and maintaining automated decision support systems. However, due to the diversity of the underlying combinatorial optimisation problems, reusability of such systems has been limited; in most cases, expensive expertise has been required to implement bespoke software components.
We explore the feasibility of fully automated generation of combinatorial optimisation systems using large language models (LLMs). An LLM will be responsible for interpreting the user-provided problem description in natural language and designing and implementing problem-specific software components. We discuss the principles of fully automated LLM-based optimisation system generation, and evaluate several proof-of-concept generators, comparing their performance on four optimisation problems.2025-03-18T20:23:51ZDaniel Karapetyanhttp://arxiv.org/abs/2504.03182v1Graphiti: Bridging Graph and Relational Database Queries2025-04-04T05:18:09ZThis paper presents an automated reasoning technique for checking equivalence between graph database queries written in Cypher and relational queries in SQL. To formalize a suitable notion of equivalence in this setting, we introduce the concept of database transformers, which transform database instances between graph and relational models. We then propose a novel verification methodology that checks equivalence modulo a given transformer by reducing the original problem to verifying equivalence between a pair of SQL queries. This reduction is achieved by embedding a subset of Cypher into SQL through syntax-directed translation, allowing us to leverage existing research on automated reasoning for SQL while obviating the need for reasoning simultaneously over two different data models. We have implemented our approach in a tool called Graphiti and used it to check equivalence between graph and relational queries. Our experiments demonstrate that Graphiti is useful both for verification and refutation and that it can uncover subtle bugs, including those found in Cypher tutorials and academic papers.2025-04-04T05:18:09Z44 pages, 23 figures, 5 tables, PLDI 2025Yang HeRuijie FangIsil DilligYuepeng Wanghttp://arxiv.org/abs/2504.03109v1Extending Data Spatial Semantics for Scale Agnostic Programming2025-04-04T01:33:34ZWe introduce extensions to Data Spatial Programming (DSP) that enable scale-agnostic programming for application development. Building on DSP's paradigm shift from data-to-compute to compute-to-data, we formalize additional intrinsic language constructs that abstract persistent state, multi-user contexts, multiple entry points, and cross-machine distribution for applications. By introducing a globally accessible root node and treating walkers as potential entry points, we demonstrate how programs can be written once and executed across scales, from single-user to multi-user, from local to distributed, without modification. These extensions allow developers to focus on domain logic while delegating runtime concerns of persistence, multi-user support, distribution, and API interfacing to the execution environment. Our approach makes scale-agnostic programming a natural extension of the topological semantics of DSP, allowing applications to seamlessly transition from single-user to multi-user scenarios, from ephemeral to persistent execution contexts, and from local to distributed execution environments.2025-04-04T01:33:34Z16 pagesJason Marshttp://arxiv.org/abs/2504.02975v1Functional Meaning for Parallel Streaming2025-04-03T18:51:21ZNondeterminism introduced by race conditions and message reorderings makes parallel and distributed programming hard. Nevertheless, promising approaches such as LVars and CRDTs address this problem by introducing a partial order structure on shared state that describes how the state evolves over time. Monotone programs that respect the order are deterministic. Datalog-inspired languages incorporate this idea of monotonicity in a first-class way but they are not general-purpose. We would like parallel and distributed languages to be as natural to use as any functional language, without sacrificing expressivity, and with a formal basis of study as appealing as the lambda calculus.
This paper presents $λ_\vee$, a core language for deterministic parallelism that embodies the ideas above. In $λ_\vee$, values may increase over time according to a streaming order and all computations are monotone with respect to that order. The streaming order coincides with the approximation order found in Scott semantics and so unifies the foundations of functional programming with the foundations of deterministic distributed computation. The resulting lambda calculus has a computationally adequate model rooted in domain theory. It integrates the compositionality and power of abstraction characteristic of functional programming with the declarative nature of Datalog.
This version of the paper includes extended exposition and appendices with proofs.2025-04-03T18:51:21ZExtended version of paper to appear in PLDI 2025Nick RiouxSteve Zdancewichttp://arxiv.org/abs/2411.08206v2Lua API and benchmark design using 3n+1 sequences: Comparing API elegance and raw speed in Redis and YottaDB databases2025-04-03T04:52:47ZElegance of a database API matters. Frequently, database APIs suit the database designer, rather than the programmer's desire for elegance and efficiency. This article pursues both: firstly, by comparing the Lua APIs for two separate databases, Redis and YottaDB. Secondly, it looks under the API covers at how object orientation can help to retain API efficiency. Finally, it benchmarks both databases using each API to implement a 3n+1 sequence generator (of Collatz Conjecture fame). It covers the eccentricities of the Lua APIs, the databases, and the nifty choice of benchmark tool, presenting benchmark results of each database's unique design.2024-11-12T21:53:13Z18 pages, 1 figures. Submitted for publication in Journal of Computer Languages special edition on LuaBerwyn Hoythttp://arxiv.org/abs/2504.02246v1C*: Unifying Programming and Verification in C2025-04-03T03:22:22ZEnsuring the correct functionality of systems software, given its safety-critical and low-level nature, is a primary focus in formal verification research and applications. Despite advances in verification tooling, conventional programmers are rarely involved in the verification of their own code, resulting in higher development and maintenance costs for verified software. A key barrier to programmer participation in verification practices is the disconnect of environments and paradigms between programming and verification practices, which limits accessibility and real-time verification.
We introduce C*, a proof-integrated language design for C programming. C* extends C with verification capabilities, powered by a symbolic execution engine and an LCF-style proof kernel. It enables real-time verification by allowing programmers to embed proof-code blocks alongside implementation code, facilitating interactive updates to the current proof state. Its expressive and extensible proof support allows users to build reusable libraries of logical definitions, theorems, and programmable proof automation. Crucially, C* unifies implementation and proof code development by using C as the common language.
We implemented a prototype of C* and evaluated it on a representative benchmark of small C programs and a challenging real-world case study: the attach function of pKVM's buddy allocator. Our results demonstrate that C* supports the verification of a broad subset of C programming idioms and effectively handles complex reasoning tasks in real-world scenarios.2025-04-03T03:22:22ZYiyuan CaoJiayi ZhuangHoujin ChenJinkai FanWenbo XuZhiyi WangDi WangQinxiang CaoYingfei XiongHaiyan ZhaoZhenjiang Huhttp://arxiv.org/abs/2504.02170v1Example-Free Learning of Regular Languages with Prefix Queries2025-04-02T23:05:10ZLanguage learning refers to the problem of inferring a mathematical model which accurately represents a formal language. Many language learning algorithms learn by asking certain types of queries about the language being modeled. Language learning is of practical interest in the field of cybersecurity, where it is used to model the language accepted by a program's input parser (also known as its input processor). In this setting, a learner can only query a string of its choice by executing the parser on it, which limits the language learning algorithms that can be used. Most practical parsers can indicate not only whether the string is valid or not, but also where the parsing failed. This extra information can be leveraged into producing a type of query we call the prefix query. Notably, no existing language learning algorithms make use of prefix queries, though some ask membership queries i.e., they ask whether or not a given string is valid. When these approaches are used to learn the language of a parser, the prefix information provided by the parser remains unused.
In this work, we present PL*, the first known language learning algorithm to make use of the prefix query, and a novel modification of the classical L* algorithm. We show both theoretically and empirically that PL* is able to learn more efficiently than L* due to its ability to exploit the additional information given by prefix queries over membership queries. Furthermore, we show how PL* can be used to learn the language of a parser, by adapting it to a more practical setting in which prefix queries are the only source of information available to it; that is, it does not have access to any labelled examples or any other types of queries. We demonstrate empirically that, even in this more constrained setting, PL* is still capable of accurately learning a range of languages of practical interest.2025-04-02T23:05:10ZEve FernandoSasha RubinRahul Gopinathhttp://arxiv.org/abs/2504.01558v1Scalable Equivalence Checking and Verification of Shallow Quantum Circuits2025-04-02T09:58:45ZThis paper concerns the problem of checking if two shallow (i.e., constant-depth) quantum circuits perform equivalent computations. Equivalence checking is a fundamental correctness question -- needed, e.g., for ensuring that transformations applied to a quantum circuit do not alter its behavior. For quantum circuits, the problem is challenging because a straightforward representation on a classical computer of each circuit's quantum state can require time and space that are exponential in the number of qubits $n$.
The paper presents decision procedures for two variants of the equivalence-checking problem. Both can be carried out on a classical computer in time and space that, for any fixed depth, is linear in $n$. Our critical insight is that local projections are precise enough to completely characterize the output state of a shallow quantum circuit. Instead of explicitly computing the output state of a circuit, we generate a set of local projections that serve as constraints on the output state. Moreover, the circuit's output state is the unique quantum state that satisfies all the constraints.
Beyond equivalence checking, we show how to use the constraint representation to check a class of assertions, both statically and at run time. Our assertion-checking methods are sound and complete for assertions expressed as conjunctions of local projections.
Our experiments show that on a server equipped with 2 x Intel\textsuperscript{\textregistered} Xeon\textsuperscript{\textregistered} Gold 6338 CPUs (128 threads total) and 1.0~TiB of RAM, running Ubuntu 20.04.6 LTS, the constraint representation of a random 100-qubit circuit of depth 6 can be computed in 19.8 seconds. For fixed inputs $\ket{0}^{\otimes 100}$, equivalence checking of {random} 100-qubit circuits of depth 3 takes 4.46 seconds; for arbitrary inputs, it takes no more than 31.96 seconds.2025-04-02T09:58:45ZComments are welcomeNengkun YuXuan Du TrinhThomas Repshttp://arxiv.org/abs/2503.23157v2Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL2025-04-01T12:53:16ZText-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks.2025-03-29T17:29:30ZMohammadreza Pourreza and Shayan Talaei contributed equally to this workMohammadreza PourrezaShayan TalaeiRuoxi SunXingchen WanHailong LiAzalia MirhoseiniAmin SaberiSercan "O. Arikhttp://arxiv.org/abs/2504.00065v1Assessing Code Understanding in LLMs2025-03-31T16:08:58ZWe present an empirical evaluation of Large Language Models in code understanding associated with non-trivial, semantic-preserving program transformations such as copy propagation or constant folding. Our findings show that LLMs fail to judge semantic equivalence in approximately 41\% of cases when no context is provided and in 29\% when given a simple generic context. To improve accuracy, we advocate integrating LLMs with code-optimization tools to enhance training and facilitate more robust program understanding.2025-03-31T16:08:58Z22 page, 7 tables, submitted at FORTE 2025Cosimo LaneveAlvise SpanòDalila RessiSabina RossiMichele Bugliesihttp://arxiv.org/abs/2503.23985v1An Empirical Study of Rust-Specific Bugs in the rustc Compiler2025-03-31T11:55:04ZRust is gaining popularity for its well-known memory safety guarantees and high performance, distinguishing it from C/C++ and JVM-based languages. Its compiler, rustc, enforces these guarantees through specialized mechanisms such as trait solving, borrow checking, and specific optimizations. However, Rust's unique language mechanisms introduce complexity to its compiler, leading to Rust-specific compiler bugs that are less common in traditional compilers. With Rust's increasing adoption in safety-critical domains, understanding these language mechanisms and their impact on compiler bugs is essential for improving the reliability of both rustc and Rust programs. Yet, we still lack a large-scale, detailed, and in-depth study of Rust-specific bugs in rustc.
To bridge this gap, this work conducts a comprehensive and systematic study of Rust-specific bugs in rustc, with a particular focus on the components that support its unique language features. Our analysis examines issues and fixes reported between 2022 and 2024, with a manual review of 301 valid issues. We categorize these bugs based on their causes, symptoms, affected compilation stages, and test case characteristics. Additionally, we evaluate existing rustc testing tools to assess their effectiveness and limitations. Our key findings include: (1) rustc bugs primarily arise from Rust's type system and lifetime model, with frequent errors in the High-Level Intermediate Representation (HIR) and Mid-Level Intermediate Representation (MIR) modules due to complex checkers and optimizations; (2) bug-revealing test cases often involve unstable features, advanced trait usages, lifetime annotations, standard APIs, and specific optimization levels; (3) while both valid and invalid programs can trigger bugs, existing testing tools struggle to detect non-crash errors, underscoring the need for further advancements in rustc testing.2025-03-31T11:55:04ZZixi LiuYang FengYunbo NiShaohua LiXizhe YinQingkai ShiBaowen XuZhendong Suhttp://arxiv.org/abs/2503.23904v1System $F^ω$ with Coherent Implicit Resolution2025-03-31T09:54:32ZWe propose a calculus for modeling implicit programming that supports first-class, overlapping, locally scoped, and higher-order instances with higher-kinded types. We propose a straightforward generalization of the well-established System $F^ω$ to implicit parameters, with a uniform treatment of type and term abstractions. Unlike previous works, we give a declarative specification of unambiguous, and thus coherent, resolution without introducing restrictions motivated by an algorithmic formulation of resolution.2025-03-31T09:54:32ZPOPL 2025 Student Research Competition (Graduate)Eugène Flessellehttp://arxiv.org/abs/2503.23791v1LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators2025-03-31T07:09:07ZRewriting C code in Rust provides stronger memory safety, yet migrating large codebases such as the 32-million-line Linux kernel remains challenging. While rule-based translators (e.g., C2Rust) provide accurate yet largely unsafe Rust programs, recent Large Language Model (LLM) approaches produce more idiomatic, safe Rust programs but frequently exhibit "laziness", omitting significant portions of the target code. To address the issue, in this paper, we present LLMigrate, an LLM-based C-to-Rust translation tool that splits modules into discrete functions, translating them individually, and then reintegrating them. LLMigrate uses static analysis to retain necessary context, pairs GPT-4o (a state-of-the-art LLM) with compiler-driven translation and program-repair techniques for complex core functions, and leverages call-graph-guided translation to ensure consistent interfaces. Evaluations on three representative Linux kernel modules (math, sort, and ramfs) show that LLMigrate requires modifying less than 15\% of the target code, significantly outperforming a pure GPT-4o-based migration.2025-03-31T07:09:07ZYuchen LiuJunhao HuYingdi ShanGe LiYanzhen ZouYihong DongTao Xiehttp://arxiv.org/abs/2312.15960v5MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks2025-03-30T09:32:03ZLarge Language Models (LLMs) have showcased impressive capabilities in handling straightforward programming tasks. However, their performance tends to falter when confronted with more challenging programming problems. We observe that conventional models often generate solutions as monolithic code blocks, restricting their effectiveness in tackling intricate questions. To overcome this limitation, we present Module-of-Thought Coder (MoTCoder). We introduce a framework for MoT instruction tuning, designed to promote the decomposition of tasks into logical sub-tasks and sub-modules. Our investigations reveal that, through the cultivation and utilization of sub-modules, MoTCoder significantly improves both the modularity and correctness of the generated solutions, leading to substantial pass@1 improvements of 5.9% on APPS and 5.8% on CodeContests. MoTCoder also achieved significant improvements in self-correction capabilities, surpassing the current SOTA by 3.3%. Additionally, we provide an analysis of between problem complexity and optimal module decomposition and evaluate the maintainability index, confirming that the code generated by MoTCoder is easier to understand and modify, which can be beneficial for long-term code maintenance and evolution. Our codes are available at https://github.com/dvlab-research/MoTCoder.2023-12-26T08:49:57ZData: https://huggingface.co/datasets/JingyaoLi/MoTCode-Data,MoTCoder-32B: https://huggingface.co/JingyaoLi/MoTCoder-32B-V1.5,MoTCoder-7B: https://huggingface.co/JingyaoLi/MoTCoder-7B-v1.5,Code: https://github.com/dvlab-research/MoTCoder, Paper: arXiv:2312.15960Jingyao LiPengguang ChenBin XiaHong XuJiaya Jiahttp://arxiv.org/abs/2503.20849v2An Algebraic Approach to Weighted Answer-set Programming2025-03-28T10:05:27ZLogic programs, more specifically, Answer-set programs, can be annotated with probabilities on facts to express uncertainty. We address the problem of propagating weight annotations on facts (eg probabilities) of an ASP to its standard models, and from there to events (defined as sets of atoms) in a dataset over the program's domain. We propose a novel approach which is algebraic in the sense that it relies on an equivalence relation over the set of events. Uncertainty is then described as polynomial expressions over variables. We propagate the weight function in the space of models and events, rather than doing so within the syntax of the program. As evidence that our approach is sound, we show that certain facts behave as expected. Our approach allows us to investigate weight annotated programs and to determine how suitable a given one is for modeling a given dataset containing events.2025-03-26T16:21:34ZFrancisco CoelhoBruno DinisDietmar SeipelSalvador Abreu