https://arxiv.org/api/jGnB3mKf0GN2pX+W5wc6SFZTj9A 2026-06-30T08:33:57Z 9958 1830 15 http://arxiv.org/abs/2502.15795v1 Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization 2025-02-18T19:16:54Z

Autoformalization, the process of transforming informal mathematical language into formal specifications and proofs remains a difficult task for state-of-the-art (large) language models. Existing works point to competing explanations for the performance gap. To this end, we introduce a novel methodology that leverages back-translation with hand-curated prompts to enhance the mathematical capabilities of language models, particularly addressing the challenge posed by the scarcity of labeled data. Specifically, we evaluate three primary variations of this strategy: (1) on-the-fly (online) backtranslation, (2) distilled (offline) backtranslation with few-shot amplification, and (3) line-by-line proof analysis integrated with proof state information. Each variant is designed to optimize data quality over quantity, focusing on the high fidelity of generated proofs rather than sheer data scale. Our findings provide evidence that employing our proposed approaches to generate synthetic data, which prioritizes quality over volume, improves the Autoformalization performance of LLMs as measured by standard benchmarks such as ProofNet. Crucially, our approach outperforms pretrained models using a minimal number of tokens. We also show, through strategic prompting and backtranslation, that our approaches surpass the performance of fine-tuning with extensive multilingual datasets such as MMA on ProofNet with only 1/150th of the tokens. Taken together, our methods show a promising new approach to significantly reduce the resources required to formalize proofs, thereby accelerating AI for math.

2025-02-18T19:16:54Z Willy Chan Michael Souliman Jakob Nordhagen Brando Miranda Elyas Obbad Kai Fronsdal Sanmi Koyejo http://arxiv.org/abs/2502.11848v1 RustSFQ: A Domain-Specific Language for SFQ Circuit Design 2025-02-17T14:43:22Z

Cell-based design of a single-flux-quantum (SFQ) digital circuit requires input-output consistency; every output signal must be consumed only once by the input of the following component, which is a unique constraint, unlike the traditional CMOS digital circuit design. While there are some cell libraries and simulation tools for SFQ circuit development, they do not verify the input-output consistency, and designers have significant responsibilities to ensure it manually. Additionally, designers have to carefully manage net names without unintended duplication and correct connectivity among nets in a netlist for simulations. We propose RustSFQ, a domain-specific language (DSL) embedded in Rust that automatically ensures the input-output consistency in the SFQ circuit by leveraging the ownership system of Rust. Each SFQ circuit element is represented as a function while wires are represented as instances, and the Rust compiler verifies that multiple elements do not share a single wire through the ownership system. Circuit descriptions in the RustSFQ are successfully compiled into low-level netlists for both analog and digital circuit simulations, and the DSL provides higher productivity than the conventional design flow. Using the RustSFQ, we developed an SFQ-based Reed-Solomon encoder with a 4-bit width for the first time as a case study. We confirmed that the circuit operated correctly at 10 GHz through circuit simulations.

2025-02-17T14:43:22Z Mebuki Oishi Sun Tanaka Shinya Takamaeda-Yamazaki http://arxiv.org/abs/2502.11344v1 A Coq implementation of a Theory of Tagged Objects 2025-02-17T01:54:30Z

We present a first step towards the Coq implementation of the Theory of Tagged Objects formalism. The concept of tagged types is encoded, and the soundness proofs are discussed with some future work suggestions.

2025-02-17T01:54:30Z Matthew Gates Alex Potanin http://arxiv.org/abs/2502.10811v1 A Unified Framework for Initial Semantics 2025-02-15T14:33:29Z

Initial semantics aims to capture inductive structures and their properties as initial objects in suitable categories. We focus on the initial semantics aiming to model the syntax and substitution structure of programming languages with variable binding as initial objects. Three distinct yet similar approaches to initial semantics have been proposed. An initial semantics result was first proved by Fiore, Plotkin, and Turi using Σ-monoids in their seminal paper published at LICS'99. Alternative frameworks were later introduced by Hirschowitz and Maggesi using modules over monads, and by Matthes and Uustalu using heterogeneous substitution systems. Since then, all approaches have been significantly developed by numerous researchers. While similar, the links between this different approaches remain unclear. This is especially the case as the literature is difficult to access, since it was mostly published in (short) conference papers without proofs, and contains many technical variations and evolutions. In this work, we introduce a framework based on monoidal categories that unifies these three distinct approaches to initial semantics, by suitably generalizing and combining them. Doing so we show that modules over monoids provide an abstract and easy to manipulate framework, that Σ-monoids and strengths naturally arise when stating and proving an initiality theorem, and that heterogeneous substitution systems enable us to prove the initiality theorem modularly. Moreover, to clarify the literature, we provide an extensive overview of related work using our framework as a cornerstone to explain the links between the different approaches and their variations.

2025-02-15T14:33:29Z Shorter version of arXiv:2401.09366 Thomas Lamiaux Benedikt Ahrens http://arxiv.org/abs/2502.10299v1 Open-Source AI-Powered Optimization in Scalene: Advancing Python Performance Profiling with DeepSeek-R1 and LLaMA 3.2 2025-02-14T17:01:06Z

Python's flexibility and ease of use come at the cost of performance inefficiencies, requiring developers to rely on profilers to optimize execution. SCALENE, a high-performance CPU, GPU, and memory profiler, provides fine-grained insights into Python applications while running significantly faster than traditional profilers. Originally, SCALENE integrated OpenAI's API to generate AI-powered optimization suggestions, but its reliance on a proprietary API limited accessibility. This study explores the feasibility of using opensource large language models (LLMs), such as DeepSeek-R1 and Llama 3.2, to generate optimization recommendations within SCALENE. Our evaluation reveals that DeepSeek-R1 provides effective code optimizations comparable to proprietary models. We integrate DeepSeek-R1 into SCALENE to automatically analyze performance bottlenecks and suggest improvements, enhancing SCALENE's utility while maintaining its open-source nature. This study demonstrates that open-source LLMs can be viable alternatives for AI-driven code optimization, paving the way for more accessible and cost-effective performance analysis tools.

2025-02-14T17:01:06Z Saem Hasan Sanju Basak http://arxiv.org/abs/2308.04741v3 Local Reasoning about Probabilistic Behaviour for Classical-Quantum Programs 2025-02-14T16:05:29Z

Verifying the functional correctness of programs with both classical and quantum constructs is a challenging task. The presence of probabilistic behaviour entailed by quantum measurements and unbounded while loops complicate the verification task greatly. We propose a new quantum Hoare logic for local reasoning about probabilistic behaviour by introducing distribution formulas to specify probabilistic properties. We show that the proof rules in the logic are sound with respect to a denotational semantics. To demonstrate the effectiveness of the logic, we formally verify the correctness of non-trivial quantum algorithms including the HHL and Shor's algorithms. Moreover, we embed our logic into the proof assistant Coq. The resulting logical framework, called CoqQLR, can facilitate semi-automated reasoning about classical--quantum programs.

2023-08-09T07:23:22Z 34 pages. arXiv admin note: text overlap with arXiv:2107.00804 Huiling Wu Yuxin Deng Ming Xu http://arxiv.org/abs/2503.04771v1 Building Bridges: Julia as an MLIR Frontend 2025-02-14T13:18:51Z

Driven by increasing compute requirements for deep learning models, compiler developers have been looking for ways to target specialised hardware and heterogeneous systems more efficiently. The MLIR project has the goal to offer infrastructure that can be used to develop new compilers and represent code at different levels of abstractions. While MLIR excels at offering developers a way to write new IR and transformations, there is no easy way for end users to generate code in these IR. In this work, we explore using the Julia programming language as a high-level input language for generating MLIR code. Most importantly, we focus on extensibility, allowing package developers to implement bindings to MLIR dialects in an intuitive and easy-to-use manner. By building on the Julia programming language, and its expressive features such as multiple dispatch and its extensible compiler, we design and implement a framework to generate MLIR code. Additionally, we evaluate this framework in three case studies. Ranging from developing a small DSL for einsum expressions, to specifying transformations on MLIR code and programming kernels to be run on GPU.

2025-02-14T13:18:51Z This is the extended abstract of a master's thesis, hosted at https://lib.ugent.be/en/catalog/rug01:003212846?i=0 \nSupervised by: Prof. Bjorn De Sutter with counselling from: Dr. Tim Besard and Thomas Faingnaert Jules Merckx http://arxiv.org/abs/2502.10100v1 Statistical data analysis for Tourism in Poland in R Programming Environment 2025-02-14T11:51:28Z

This study utilises the R programming language for statistical data analysis to understand Tourism dynamics in Poland. It focuses on methods for data visualisation, multivariate statistics, and hypothesis testing. To investigate the expenditure behavior of tourist, spending patterns, correlations, and associations among variables were analysed in the dataset. The results revealed a significant relationship between accommodation type and the purpose of trip, showing that the purpose of a trip impacts the selection of accommodation. A strong correlation was observed between organizer expenditure and private expenditure, indicating that individual spending are more when the spending on organizing the trip are higher. However, no significant difference was observed in total expenditure across different accommodation types and purpose of the trip revealing that travelers tend to spend similar amounts regardless of their reason for travel or choice of accommodation. Although significant relationships were observed among certain variables, ANOVA could not be applied because the dataset was not able to hold on the normality assumption. In future, the dataset can be explored further to find more meaningful insights. The developed code is available on GitHub: https://github.com/SaadAhmedJamal/DataAnalysis RProgEnv.

2025-02-14T11:51:28Z 5 pages Saad Ahmed Jamal http://arxiv.org/abs/2502.09819v1 A Solver-Aided Hierarchical Language for LLM-Driven CAD Design 2025-02-13T23:31:30Z

Large language models (LLMs) have been enormously successful in solving a wide variety of structured and unstructured generative tasks, but they struggle to generate procedural geometry in Computer Aided Design (CAD). These difficulties arise from an inability to do spatial reasoning and the necessity to guide a model through complex, long range planning to generate complex geometry. We enable generative CAD Design with LLMs through the introduction of a solver-aided, hierarchical domain specific language (DSL) called AIDL, which offloads the spatial reasoning requirements to a geometric constraint solver. Additionally, we show that in the few-shot regime, AIDL outperforms even a language with in-training data (OpenSCAD), both in terms of generating visual results closer to the prompt and creating objects that are easier to post-process and reason about.

2025-02-13T23:31:30Z Benjamin T. Jones Felix Hähnlein Zihan Zhang Maaz Ahmad Vladimir Kim Adriana Schulz http://arxiv.org/abs/2502.09230v1 Relating Answer Set Programming and Many-sorted Logics for Formal Verification 2025-02-13T11:52:40Z

Answer Set Programming (ASP) is an important logic programming paradigm within the field of Knowledge Representation and Reasoning. As a concise, human-readable, declarative language, ASP is an excellent tool for developing trustworthy (especially, artificially intelligent) software systems. However, formally verifying ASP programs offers some unique challenges, such as 1. a lack of modularity (the meanings of rules are difficult to define in isolation from the enclosing program), 2. the ground-and-solve semantics (the meanings of rules are dependent on the input data with which the program is grounded), and 3. limitations of existing tools. My research agenda has been focused on addressing these three issues with the intention of making ASP verification an accessible, routine task that is regularly performed alongside program development. In this vein, I have investigated alternative semantics for ASP based on translations into the logic of here-and-there and many-sorted first-order logic. These semantics promote a modular understanding of logic programs, bypass grounding, and enable us to use automated theorem provers to automatically verify properties of programs.

2025-02-13T11:52:40Z In Proceedings ICLP 2024, arXiv:2502.08453 EPTCS 416, 2025, pp. 332-344 Zachary Hansen University of Nebraska Omaha 10.4204/EPTCS.416.33 http://arxiv.org/abs/2502.09223v1 A Prolog Program for Bottom-up Evaluation 2025-02-13T11:51:07Z

This short paper describes a simple and intuitive Prolog program, a metainterpreter, that computes the bottom up meaning of a simple positive Horn clause definition. It involves a simple transformation of the object program rules into metarules, which are then used by a metainterpreter to compute bottom up the model of the original program. The resulting algorithm is a form of semi-naive bottom-up evaluation. We discuss various reasons why this Prolog program is particularly interesting. In particular, this is perhaps the only Prolog program for which I find the use of Prolog's assert/1 to be intrinsic, easily understood, and the best, most perspicuous, way to program an algorithm. This short paper might be best characterized as a Prolog programming pearl.

2025-02-13T11:51:07Z In Proceedings ICLP 2024, arXiv:2502.08453 EPTCS 416, 2025, pp. 229-235 David S. Warren Stony Brook University 10.4204/EPTCS.416.20 http://arxiv.org/abs/2502.09221v1 Pearce's Characterisation in an Epistemic Domain 2025-02-13T11:50:36Z

Answer-set programming (ASP) is a successful problem-solving approach in logic-based AI. In ASP, problems are represented as declarative logic programs, and solutions are identified through their answer sets. Equilibrium logic (EL) is a general-purpose nonmonotonic reasoning formalism, based on a monotonic logic called here-and-there logic. EL was basically proposed by Pearce as a foundational framework of ASP. Epistemic specifications (ES) are extensions of ASP-programs with subjective literals. These new modal constructs in the ASP-language make it possible to check whether a regular literal of ASP is true in every (or some) answer-set of a program. ES-programs are interpreted by world-views, which are essentially collections of answer-sets. (Reflexive) autoepistemic logic is a nonmonotonic formalism, modeling self-belief (knowledge) of ideally rational agents. A relatively new semantics for ES is based on a combination of EL and (reflexive) autoepistemic logic. In this paper, we first propose an overarching framework in the epistemic ASP domain. We then establish a correspondence between existing (reflexive) (auto)epistemic equilibrium logics and our easily-adaptable comprehensive framework, building on Pearce's characterisation of answer-sets as equilibrium models. We achieve this by extending Ferraris' work on answer sets for propositional theories to the epistemic case and reveal the relationship between some ES-semantic proposals.

2025-02-13T11:50:36Z In Proceedings ICLP 2024, arXiv:2502.08453 EPTCS 416, 2025, pp. 201-214 Ezgi Iraz Su Sinop University 10.4204/EPTCS.416.18 http://arxiv.org/abs/2402.05980v3 Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates 2025-02-12T15:40:35Z

Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.

2024-02-08T06:48:01Z Ashish Hooda Mihai Christodorescu Miltiadis Allamanis Aaron Wilson Kassem Fawaz Somesh Jha http://arxiv.org/abs/2502.08497v1 Foundations of Digital Circuits: Denotation, Operational, and Algebraic Semantics 2025-02-12T15:32:52Z

This thesis details a project to define a fully compositional theory of synchronous sequential circuits built from primitive components, motivated by applying techniques successfully used in programming languages to hardware. The first part of the thesis defines the syntactic foundations of sequential circuit morphisms, and then builds three different semantic theories: denotational, operational and algebraic. We characterise the denotational semantics of sequential circuits as certain causal stream functions, as well as providing a link to existing circuit methodologies by mapping between circuit morphisms, stream functions and Mealy machines. The operational semantics is defined as a strategy for applying some global transformations followed by local reductions to demonstrate how a circuit processes a value, leading to a notion of observational equivalence. The algebraic semantics consists of equations for bringing circuits into a pseudo-normal form, and then encoding between different state sets. This part of the thesis concludes with a discussion of some novel applications, such as those for using partial evaluation for digital circuits. While mathematically rigorous, the categorical string diagram formalism is not suited for reasoning computationally. The second part of this thesis details an extension of string diagram rewriting with hypergraphs so that it is compatible with the traced comonoid structure present in the category of digital circuits. We identify the properties that characterise cospans of hypergraphs corresponding to traced comonoid terms, and demonstrate how to identify rewriting contexts valid for rewriting modulo traced comonoid structure. We apply the graph rewriting framework to fixed point operators as well as the operational semantics from the first part, and present a new hardware description language based on these theoretical developments.

2025-02-12T15:32:52Z PhD thesis, 272 pages George Kaye http://arxiv.org/abs/2404.18874v2 Useful Evaluation: Syntax and Semantics (Technical Report) 2025-02-12T09:07:45Z

This work provides the first inductive definition of useful CBV evaluation. For that, we first restrict the substitution operation in the Value Substitution Calculus to be linear, yielding the LCBV strategy. We then further restrict substitution in LCBV, so that substitution contributes to the progress of the computation. This optimisation is the UCBV strategy, and its notion of substitution is sensitive to the surrounding evaluation context, so it is non-trivial to capture it inductively. Moreover, we show that UCBV is a sound and complete implementation of LCBV, optimised to implement useful evaluation. As a further contribution, we show that an existing notion of usefulness in the literature, namely the GLAMoUr abstract machine, implements the UCBV strategy with polynomial overhead in time. This establishes that UCBV is time-invariant, i.e., that the number of reduction steps to normal form in UCBV can be used as a measure of time complexity. Defining UCBV leads us to the first semantic model of useful CBV evaluation through system U, a non-idempotent intersection type system. Our main result is a characterisation of termination for useful CBV evaluation via system U: a term is typable in system U if and only if it terminates in UCBV. Additionally, system U provides a quantitative interpretation for UCBV, offering exact step-count information for program evaluation. Even though the specification of the operational semantics of UCBV is highly complex, system U is notably simple. As far as we know, system U is one of the scarce quantitative type systems capturing exactly the substitution step-count for a call-by-value strategy.

2024-04-29T17:08:15Z Pablo Barenbaum Delia Kesner Mariana Milicich