https://arxiv.org/api/V5+Vl96r+XXC9bAHMZ/WcYQW0S4 2026-06-23T16:49:31Z 9934 1050 15 http://arxiv.org/abs/2510.19850v1 Prompt Decorators: A Declarative and Composable Syntax for Reasoning, Formatting, and Control in LLMs 2025-10-21T17:35:49Z

Large Language Models (LLMs) are central to reasoning, writing, and decision-support workflows, yet users lack consistent control over how they reason and express outputs. Conventional prompt engineering relies on verbose natural-language instructions, limiting reproducibility, modularity, and interpretability. This paper introduces Prompt Decorators, a declarative, composable syntax that governs LLM behavior through compact control tokens such as +++Reasoning, +++Tone(style=formal), and +++Import(topic="Systems Thinking"). Each decorator modifies a behavioral dimension, such as reasoning style, structure, or tone, without changing task content. The framework formalizes twenty core decorators organized into two functional families (Cognitive & Generative and Expressive & Systemic), each further decomposed into subcategories that govern reasoning, interaction, expression, and session-control. It defines a unified syntax, scoping model, and deterministic processing pipeline enabling predictable and auditable behavior composition. By decoupling task intent from execution behavior, Prompt Decorators create a reusable and interpretable interface for prompt design. Illustrative use cases demonstrate improved reasoning transparency, reduced prompt complexity, and standardized model behavior across domains. The paper concludes with implications for interoperability, behavioral consistency, and the development of declarative interfaces for scalable AI systems.

2025-10-21T17:35:49Z Mostapha Kalami Heris http://arxiv.org/abs/2509.23686v2 Evaluating Program Semantics Reasoning with Type Inference in System F 2025-10-20T22:38:43Z

Large Language Models (LLMs) are increasingly integrated into the software engineering ecosystem. Their test-time compute (TTC) reasoning capabilities show significant potential for understanding program logic and semantics beyond mere token recognition. However, current benchmarks for code reasoning lack a formal, program-centric deductive framework to ensure sound evaluation, and are incapable of assessing whether models genuinely reason about program semantics or merely exploit superficial associations between natural language and code tokens. To bridge this gap, we introduce TF-Bench, a benchmark designed to evaluate LLM reasoning based on type inference in System F, a task we refer to as program semantics reasoning. By employing verified transformations to remove semantically irrelevant natural language, we construct TF-Bench_pure, a purely semantics-driven variant of TF-Bench. Our analysis reveals substantial limitations in state-of-the-art LLMs, with the best-performing LLM (Claude-3.7-sonnet) achieving only 55.85% accuracy on TF-Bench_pure. Additionally, we propose two novel metrics to assess robustness and the effectiveness of test-time reasoning, underscoring critical limitations in current LLM capabilities and highlighting essential directions for future research.

2025-09-28T06:57:42Z NeurIPS '25, package released at: https://github.com/SecurityLab-UCD/TF-Bench Yifeng He Luning Yang Christopher Castro Gaw Gonzalo Hao Chen http://arxiv.org/abs/1402.4843v7 Range Algebra for Safe Array Splits 2025-10-20T21:20:57Z

We present a language-agnostic range algebra that derives correct index intervals for splitting arrays and implementing binary search, eliminating off-by-one and empty-range bugs. From two primitives -- $\lfloor n/2\rfloor$ and $\lceil n/2\rceil = \lfloor (n + 1)/2\rfloor$ -- we obtain four canonical splits (Natural, Left+, Right+, Center-cut) with proofs of coverage and balance for all $n \ge 0$ and any base index $b \in \mathbb{Z}$. We record invariants (normalization to left-closed/right-open intervals $[u, w)$ with len $= w - u$), document cross-language division quirks, and give drop-in code patterns and property tests. The result is a small, memorable spec that can be copied verbatim into C/C++/Java/C\#/Go/Rust/Swift/JS/Python.

2014-02-19T22:50:29Z After I compressed the previous, much longer version, this one became an array of tautologies, which is difficult to justify as an article in any sense. The value is no different from a Boolean algebra, which is nonsensical to restate these days. It was even difficult to actually write in LaTeX without repeating the same symbols. It is not working as an article Aleksandar Perisic http://arxiv.org/abs/2509.20020v3 The Syntax and Semantics of einsum 2025-10-20T13:29:52Z

In 2011, einsum was introduced to NumPy as a practical and convenient notation for tensor expressions in machine learning, quantum circuit simulation, and other fields. It has since been implemented in additional Python frameworks such as PyTorch and TensorFlow, as well as in other programming languages such as Julia. Despite its practical success, the einsum notation still lacks a solid theoretical basis, and is not unified across the different frameworks, limiting opportunities for formal reasoning and systematic optimization. In this work, we discuss the terminology of tensor expressions and provide a formal definition of the einsum language. Based on this definition, we formalize and prove important equivalence rules for tensor expressions and highlight their relevance in practical applications.

2025-09-24T11:36:02Z 21 pages, 1 figure. Includes formal definitions, proofs of algebraic properties, and nesting/denesting rules for the einsum notation Maurice Wenig Paul G. Rump Mark Blacher Joachim Giesen http://arxiv.org/abs/2510.17505v1 Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums 2025-10-20T13:02:17Z

Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are primarily designed for CPUs and rarely produce high-performance GPU code. Second, when computations involve both sparse and dense regions, these compilers often fail to optimize the dense portions effectively. In this paper, we propose a new approach for expressing sparse computations. We start from format-agnostic Einsums over sparse tensors and rewrite them into format-conscious indirect Einsums, which explicitly encode format information by mapping sparse data and metadata onto dense tensor operations through indirect indexing. To execute indirect Einsums, we introduce the Insum compiler, which generates efficient GPU code for these Einsums by lowering to the PyTorch compiler, extended to better support Tensor Core-enabled indirect Einsums. We also present two fixed-length sparse formats, GroupCOO and BlockGroupCOO, designed to fit naturally with indirect Einsums. Our approach achieves 1.14x to 3.81x speedups across a range of sparse GPU applications while reducing lines of code by 202x to 4491x compared to hand-written implementations.

2025-10-20T13:02:17Z Jaeyeon Won Willow Ahrens Joel S. Emer Saman Amarasinghe http://arxiv.org/abs/2212.06656v2 A programming language characterizing quantum polynomial time 2025-10-20T08:50:48Z

We introduce a first-order quantum programming language, named FOQ, whose terminating programs are reversible. We restrict FOQ to a strict and tractable subset, named PFOQ, of terminating programs with bounded width, that provides a first programming language-based characterization of the quantum complexity class FBQP. Finally, we present a tractable semantics-preserving algorithm compiling a PFOQ program to a quantum circuit of size polynomial in the number of input qubits.

2022-12-13T15:38:59Z International Conference on Foundations of Software Science and Computation Structures, Orna Kupferman; Pawel Sobocinski, Apr 2023, Paris, France. pp.156-175 Emmanuel Hainry MOCQUA Romain Péchoux MOCQUA Mário Silva MOCQUA, LORIA http://arxiv.org/abs/2510.17220v1 Exploiting the Potential of Linearity in Automatic Differentiation and Computational Cryptography 2025-10-20T07:02:48Z

The concept of linearity plays a central role in both mathematics and computer science, with distinct yet complementary meanings. In mathematics, linearity underpins functions and vector spaces, forming the foundation of linear algebra and functional analysis. In computer science, it relates to resource-sensitive computation. Linear Logic (LL), for instance, models assumptions that must be used exactly once, providing a natural framework for tracking computational resources such as time, memory, or data access. This dual perspective makes linearity essential to programming languages, type systems, and formal models that express both computational complexity and composability. Bridging these interpretations enables rigorous yet practical methodologies for analyzing and verifying complex systems. This thesis explores the use of LL to model programming paradigms based on linearity. It comprises two parts: ADLL and CryptoBLL. The former applies LL to Automatic Differentiation (AD), modeling linear functions over the reals and the transposition operation. The latter uses LL to express complexity constraints on adversaries in computational cryptography. In AD, two main approaches use linear type systems: a theoretical one grounded in proof theory, and a practical one implemented in JAX, a Python library developed by Google for machine learning research. In contrast, frameworks like PyTorch and TensorFlow support AD without linear types. ADLL aims to bridge theory and practice by connecting JAX's type system to LL. In modern cryptography, several calculi aim to model cryptographic proofs within the computational paradigm. These efforts face a trade-off between expressiveness, to capture reductions, and simplicity, to abstract probability and complexity. CryptoBLL addresses this tension by proposing a framework for the automatic analysis of protocols in computational cryptography.

2025-10-20T07:02:48Z Giulia Giusti http://arxiv.org/abs/2410.14835v2 Towards Automated Verification of LLM-Synthesized C Programs 2025-10-20T01:57:32Z

We present \synver{}, a novel synthesis and verification framework for C programs, that deploys a Large Language Model (LLM) to search for a candidate program that satisfies the given specification. Our key idea is to impose syntactic and semantic biases on programs generated by LLMs, such that the synthesized program is more amenable to automated verification. Based on this idea, we propose a novel specification-verification tool, built on top of Verified Software Toolchain, that help automate the process. Our experiments on a diverse set of benchmarks drawn from the deductive program synthesis community, shows that this approach is scalable and extensible. The benchmarks constitute of specifications comprising of basic coding examples, Separation Logic based assertions, and API specifications.

2024-10-18T19:13:44Z Prasita Mukherjee Benjamin Delaware http://arxiv.org/abs/2510.16594v1 SimpliPy: A Source-Tracking Notional Machine for Simplified Python 2025-10-18T17:48:25Z

Misconceptions about program execution hinder many novice programmers. We introduce SimpliPy, a notional machine designed around a carefully chosen Python subset to clarify core control flow and scoping concepts. Its foundation is a precise operational semantics that explicitly tracks source code line numbers for each execution step, making the link between code and behavior unambiguous. Complementing the dynamic semantics, SimpliPy uses static analysis to generate Control Flow Graphs (CFGs) and identify lexical scopes, helping students build a structural understanding before tracing. We also present an interactive web-based debugger built on these principles. This tool embodies the formal techniques, visualizing the operational state (environments, stack) and using the static CFG to animate control flow directly on the graph during step-by-step execution. SimpliPy thus integrates formal semantics, program analysis, and visualization to offer both a pedagogical approach and a practical demonstration of applying formal methods to program understanding.

2025-10-18T17:48:25Z 15 pages, 1 figure, 1 table. Accepted at the 4th Workshop on Research Highlights in Programming Languages (RHPL 2025), co-located with FSTTCS 2025. Code available at: https://github.com/PraneethJain/simplipy Moida Praneeth Jain Venkatesh Choppella http://arxiv.org/abs/2510.17889v1 Hey Pentti, We Did It!: A Fully Vector-Symbolic Lisp 2025-10-18T14:42:36Z

Kanerva (2014) suggested that it would be possible to construct a complete Lisp out of a vector-symbolic architecture. We present the general form of a vector-symbolic representation of the five Lisp elementary functions, lambda expressions, and other auxiliary functions, found in the Lisp 1.5 specification McCarthy (1960), which is near minimal and sufficient for Turing-completeness. Our specific implementation uses holographic reduced representations Plate (1995), with a lookup table cleanup memory. Lisp, as all Turing-complete languages, is a Cartesian closed category, unusual in its proximity to the mathematical abstraction. We discuss the mathematics, the purpose, and the significance of demonstrating vector-symbolic architectures' Cartesian-closure, as well as the importance of explicitly including cleanup memories in the specification of the architecture.

2025-10-18T14:42:36Z Tomkins Flanagan, E., & Kelly, M. A. (2024, July). Hey Pentti, We Did It!: A Fully Vector-Symbolic Lisp. Abstract published at MathPsych / ICCM 2024. Via mathpsych.org/presentation/1541 Eilene Tomkins-Flanagan Department of Cognitive Science, Carleton University Mary A. Kelly Department of Cognitive Science, Carleton University http://arxiv.org/abs/2510.16357v1 MLCPD: A Unified Multi-Language Code Parsing Dataset with Universal AST Schema 2025-10-18T05:31:14Z

We introduce the MultiLang Code Parser Dataset (MLCPD), a large-scale, language-agnostic dataset unifying syntactic and structural representations of code across ten major programming languages. MLCPD contains over seven million parsed source files normalized under our proposed universal Abstract Syntax Tree (AST) schema, enabling consistent cross-language reasoning, structural learning, and multilingual software analysis. Unlike existing corpora that focus purely on token-level code or isolated parsers, MLCPD provides both hierarchical tree representations and rich metadata for every file, ensuring lossless syntactic coverage and structural uniformity. Each entry includes a normalized schema, language-level metadata, and abstracted node semantics stored in Parquet format for scalable retrieval. Empirical analyses reveal strong cross-language structural regularities-demonstrating that syntactic graphs from languages as diverse as Python, Java, and Go can be aligned under a shared schema. We release the dataset publicly on Hugging Face and the accompanying codebase on GitHub, which includes complete pipelines for dataset reproduction, grammar compilation, and a visualization tool for exploring the unified AST across languages. Together, these resources establish MLCPD as an open, reproducible foundation for future research in cross-language representation learning and program analysis.

2025-10-18T05:31:14Z 12 pages, 7 figures, 4 tables, 2 algorithms, and 34 references. HuggingFace: https://huggingface.co/datasets/jugalgajjar/MultiLang-Code-Parser-Dataset GitHub: https://github.com/JugalGajjar/MultiLang-Code-Parser-Dataset Jugal Gajjar Kamalasankari Subramaniakuppusamy http://arxiv.org/abs/2504.05482v2 Imperative vs. Declarative Programming Paradigms for Open-Universe Scene Generation 2025-10-17T15:41:29Z

Current methods for generating 3D scene layouts from text predominantly follow a declarative paradigm, where a Large Language Model (LLM) specifies high-level constraints that are then resolved by a separate solver. This paper challenges that consensus by introducing a more direct, imperative approach. We task an LLM with generating a step-by-step program that iteratively places each object relative to those already in the scene. This paradigm simplifies the underlying scene specification language, enabling the creation of more complex, varied, and highly structured layouts that are difficult to express declaratively. To improve the robustness, we complement our method with a novel, LLM-free error correction mechanism that operates directly on the generated code, iteratively adjusting parameters within the program to resolve collisions and other inconsistencies. In forced-choice perceptual studies, human participants overwhelmingly preferred our imperative layouts, choosing them over those from two state-of-the-art declarative systems 82% and 94% of the time, demonstrating the significant potential of this alternative paradigm. Finally, we present a simple automated evaluation metric for 3D scene layout generation that correlates strongly with human judgment.

2025-04-07T20:24:24Z Maxim Gumin Do Heon Han Seung Jean Yoo Aditya Ganeshan R. Kenny Jones Rio Aguina-Kang Stewart Morris Daniel Ritchie http://arxiv.org/abs/2510.15700v1 ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations 2025-10-17T14:45:30Z

Neural theorem proving has advanced rapidly in the past year, reaching IMO gold-medalist capabilities and producing formal proofs that span thousands of lines. Although such proofs are mechanically verified by formal systems like Lean, their excessive length renders them difficult for humans to comprehend and limits their usefulness for mathematical insight. Proof simplification is therefore a critical bottleneck. Yet, training data for this task is scarce, and existing methods -- mainly agentic scaffolding with off-the-shelf LLMs -- struggle with the extremely long proofs generated by RL-trained provers. We introduce ProofOptimizer, the first language model trained to simplify Lean proofs without requiring additional human supervision. ProofOptimizer is trained via expert iteration and reinforcement learning, using Lean to verify simplifications and provide training signal. At inference time, it operates within an iterative proof-shortening workflow, progressively reducing proof length. Experiments show that ProofOptimizer substantially compresses proofs generated by state-of-the-art RL-trained provers on standard benchmarks, reducing proof length by 87% on miniF2F, 57% on PutnamBench, and 49% on Seed-Prover's IMO 2025 proofs. Beyond conciseness, the simplified proofs check faster in Lean and further improve downstream prover performance when reused as training data for supervised finetuning.

2025-10-17T14:45:30Z 52 pages, 16 figures, website: http://proof-optimizer.github.io/ Alex Gu Bartosz Piotrowski Fabian Gloeckle Kaiyu Yang Aram H. Markosyan http://arxiv.org/abs/2510.15178v1 Visualizing miniKanren Search with a Fine-Grained Small-Step Semantics 2025-10-16T22:34:36Z

We present a deterministic small-step operational semantics for miniKanren that explicitly represents the evolving search tree during execution. This semantics models interleaving and goal scheduling at fine granularity, allowing each evaluation step-goal activation, suspension, resumption, and success -- to be visualized precisely. Building on this model, we implement an interactive visualizer that renders the search tree as it develops and lets users step through execution. The tool acts as a pedagogical notional machine for reasoning about miniKanren's fair search behavior, helping users understand surprising answer orders and operational effects. Our semantics and tool are validated through property-based testing and illustrated with several examples.

2025-10-16T22:34:36Z 2025 miniKanren Workshop Brysen Pfingsten Jason Hemann http://arxiv.org/abs/2510.14972v1 TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar 2025-10-16T17:59:45Z

Large language models (LLMs) for code rely on subword tokenizers, such as byte-pair encoding (BPE), learned from mixed natural language text and programming language code but driven by statistics rather than grammar. As a result, semantically identical code snippets can be tokenized differently depending on superficial factors such as whitespace or identifier naming. To measure the impact of this misalignment, we introduce TokDrift, a framework that applies semantic-preserving rewrite rules to create code variants differing only in tokenization. Across nine code LLMs, including large ones with over 30B parameters, even minor formatting changes can cause substantial shifts in model behavior. Layer-wise analysis shows that the issue originates in early embeddings, where subword segmentation fails to capture grammar token boundaries. Our findings identify misaligned tokenization as a hidden obstacle to reliable code understanding and generation, highlighting the need for grammar-aware tokenization for future code LLMs.

2025-10-16T17:59:45Z Yinxi Li Yuntian Deng Pengyu Nie