https://arxiv.org/api/V5+Vl96r+XXC9bAHMZ/WcYQW0S42026-06-23T16:49:31Z9934105015http://arxiv.org/abs/2510.19850v1Prompt Decorators: A Declarative and Composable Syntax for Reasoning, Formatting, and Control in LLMs2025-10-21T17:35:49ZLarge Language Models (LLMs) are central to reasoning, writing, and decision-support workflows, yet users lack consistent control over how they reason and express outputs. Conventional prompt engineering relies on verbose natural-language instructions, limiting reproducibility, modularity, and interpretability. This paper introduces Prompt Decorators, a declarative, composable syntax that governs LLM behavior through compact control tokens such as +++Reasoning, +++Tone(style=formal), and +++Import(topic="Systems Thinking"). Each decorator modifies a behavioral dimension, such as reasoning style, structure, or tone, without changing task content. The framework formalizes twenty core decorators organized into two functional families (Cognitive & Generative and Expressive & Systemic), each further decomposed into subcategories that govern reasoning, interaction, expression, and session-control. It defines a unified syntax, scoping model, and deterministic processing pipeline enabling predictable and auditable behavior composition. By decoupling task intent from execution behavior, Prompt Decorators create a reusable and interpretable interface for prompt design. Illustrative use cases demonstrate improved reasoning transparency, reduced prompt complexity, and standardized model behavior across domains. The paper concludes with implications for interoperability, behavioral consistency, and the development of declarative interfaces for scalable AI systems.2025-10-21T17:35:49ZMostapha Kalami Herishttp://arxiv.org/abs/2509.23686v2Evaluating Program Semantics Reasoning with Type Inference in System F2025-10-20T22:38:43ZLarge Language Models (LLMs) are increasingly integrated into the software engineering ecosystem. Their test-time compute (TTC) reasoning capabilities show significant potential for understanding program logic and semantics beyond mere token recognition. However, current benchmarks for code reasoning lack a formal, program-centric deductive framework to ensure sound evaluation, and are incapable of assessing whether models genuinely reason about program semantics or merely exploit superficial associations between natural language and code tokens. To bridge this gap, we introduce TF-Bench, a benchmark designed to evaluate LLM reasoning based on type inference in System F, a task we refer to as program semantics reasoning. By employing verified transformations to remove semantically irrelevant natural language, we construct TF-Bench_pure, a purely semantics-driven variant of TF-Bench. Our analysis reveals substantial limitations in state-of-the-art LLMs, with the best-performing LLM (Claude-3.7-sonnet) achieving only 55.85% accuracy on TF-Bench_pure. Additionally, we propose two novel metrics to assess robustness and the effectiveness of test-time reasoning, underscoring critical limitations in current LLM capabilities and highlighting essential directions for future research.2025-09-28T06:57:42ZNeurIPS '25, package released at: https://github.com/SecurityLab-UCD/TF-BenchYifeng HeLuning YangChristopher Castro Gaw GonzaloHao Chenhttp://arxiv.org/abs/1402.4843v7Range Algebra for Safe Array Splits2025-10-20T21:20:57ZWe present a language-agnostic range algebra that derives correct index intervals for splitting arrays and implementing binary search, eliminating off-by-one and empty-range bugs. From two primitives -- $\lfloor n/2\rfloor$ and $\lceil n/2\rceil = \lfloor (n + 1)/2\rfloor$ -- we obtain four canonical splits (Natural, Left+, Right+, Center-cut) with proofs of coverage and balance for all $n \ge 0$ and any base index $b \in \mathbb{Z}$. We record invariants (normalization to left-closed/right-open intervals $[u, w)$ with len $= w - u$), document cross-language division quirks, and give drop-in code patterns and property tests. The result is a small, memorable spec that can be copied verbatim into C/C++/Java/C\#/Go/Rust/Swift/JS/Python.2014-02-19T22:50:29ZAfter I compressed the previous, much longer version, this one became an array of tautologies, which is difficult to justify as an article in any sense. The value is no different from a Boolean algebra, which is nonsensical to restate these days. It was even difficult to actually write in LaTeX without repeating the same symbols. It is not working as an articleAleksandar Perisichttp://arxiv.org/abs/2509.20020v3The Syntax and Semantics of einsum2025-10-20T13:29:52ZIn 2011, einsum was introduced to NumPy as a practical and convenient notation for tensor expressions in machine learning, quantum circuit simulation, and other fields. It has since been implemented in additional Python frameworks such as PyTorch and TensorFlow, as well as in other programming languages such as Julia. Despite its practical success, the einsum notation still lacks a solid theoretical basis, and is not unified across the different frameworks, limiting opportunities for formal reasoning and systematic optimization. In this work, we discuss the terminology of tensor expressions and provide a formal definition of the einsum language. Based on this definition, we formalize and prove important equivalence rules for tensor expressions and highlight their relevance in practical applications.2025-09-24T11:36:02Z21 pages, 1 figure. Includes formal definitions, proofs of algebraic properties, and nesting/denesting rules for the einsum notationMaurice WenigPaul G. RumpMark BlacherJoachim Giesenhttp://arxiv.org/abs/2510.17505v1Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums2025-10-20T13:02:17ZProgramming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are primarily designed for CPUs and rarely produce high-performance GPU code. Second, when computations involve both sparse and dense regions, these compilers often fail to optimize the dense portions effectively. In this paper, we propose a new approach for expressing sparse computations. We start from format-agnostic Einsums over sparse tensors and rewrite them into format-conscious indirect Einsums, which explicitly encode format information by mapping sparse data and metadata onto dense tensor operations through indirect indexing. To execute indirect Einsums, we introduce the Insum compiler, which generates efficient GPU code for these Einsums by lowering to the PyTorch compiler, extended to better support Tensor Core-enabled indirect Einsums. We also present two fixed-length sparse formats, GroupCOO and BlockGroupCOO, designed to fit naturally with indirect Einsums. Our approach achieves 1.14x to 3.81x speedups across a range of sparse GPU applications while reducing lines of code by 202x to 4491x compared to hand-written implementations.2025-10-20T13:02:17ZJaeyeon WonWillow AhrensJoel S. EmerSaman Amarasinghehttp://arxiv.org/abs/2212.06656v2A programming language characterizing quantum polynomial time2025-10-20T08:50:48ZWe introduce a first-order quantum programming language, named FOQ, whose terminating programs are reversible. We restrict FOQ to a strict and tractable subset, named PFOQ, of terminating programs with bounded width, that provides a first programming language-based characterization of the quantum complexity class FBQP. Finally, we present a tractable semantics-preserving algorithm compiling a PFOQ program to a quantum circuit of size polynomial in the number of input qubits.2022-12-13T15:38:59ZInternational Conference on Foundations of Software Science and Computation Structures, Orna Kupferman; Pawel Sobocinski, Apr 2023, Paris, France. pp.156-175Emmanuel HainryMOCQUARomain PéchouxMOCQUAMário SilvaMOCQUA, LORIAhttp://arxiv.org/abs/2510.17220v1Exploiting the Potential of Linearity in Automatic Differentiation and Computational Cryptography2025-10-20T07:02:48ZThe concept of linearity plays a central role in both mathematics and computer science, with distinct yet complementary meanings. In mathematics, linearity underpins functions and vector spaces, forming the foundation of linear algebra and functional analysis. In computer science, it relates to resource-sensitive computation. Linear Logic (LL), for instance, models assumptions that must be used exactly once, providing a natural framework for tracking computational resources such as time, memory, or data access. This dual perspective makes linearity essential to programming languages, type systems, and formal models that express both computational complexity and composability. Bridging these interpretations enables rigorous yet practical methodologies for analyzing and verifying complex systems.
This thesis explores the use of LL to model programming paradigms based on linearity. It comprises two parts: ADLL and CryptoBLL. The former applies LL to Automatic Differentiation (AD), modeling linear functions over the reals and the transposition operation. The latter uses LL to express complexity constraints on adversaries in computational cryptography.
In AD, two main approaches use linear type systems: a theoretical one grounded in proof theory, and a practical one implemented in JAX, a Python library developed by Google for machine learning research. In contrast, frameworks like PyTorch and TensorFlow support AD without linear types. ADLL aims to bridge theory and practice by connecting JAX's type system to LL.
In modern cryptography, several calculi aim to model cryptographic proofs within the computational paradigm. These efforts face a trade-off between expressiveness, to capture reductions, and simplicity, to abstract probability and complexity. CryptoBLL addresses this tension by proposing a framework for the automatic analysis of protocols in computational cryptography.2025-10-20T07:02:48ZGiulia Giustihttp://arxiv.org/abs/2410.14835v2Towards Automated Verification of LLM-Synthesized C Programs2025-10-20T01:57:32ZWe present \synver{}, a novel synthesis and verification framework for C programs, that deploys a Large Language Model (LLM) to search for a candidate program that satisfies the given specification. Our key idea is to impose syntactic and semantic biases on programs generated by LLMs, such that the synthesized program is more amenable to automated verification. Based on this idea, we propose a novel specification-verification tool, built on top of Verified Software Toolchain, that help automate the process. Our experiments on a diverse set of benchmarks drawn from the deductive program synthesis community, shows that this approach is scalable and extensible. The benchmarks constitute of specifications comprising of basic coding examples, Separation Logic based assertions, and API specifications.2024-10-18T19:13:44ZPrasita MukherjeeBenjamin Delawarehttp://arxiv.org/abs/2510.16594v1SimpliPy: A Source-Tracking Notional Machine for Simplified Python2025-10-18T17:48:25ZMisconceptions about program execution hinder many novice programmers. We introduce SimpliPy, a notional machine designed around a carefully chosen Python subset to clarify core control flow and scoping concepts. Its foundation is a precise operational semantics that explicitly tracks source code line numbers for each execution step, making the link between code and behavior unambiguous. Complementing the dynamic semantics, SimpliPy uses static analysis to generate Control Flow Graphs (CFGs) and identify lexical scopes, helping students build a structural understanding before tracing. We also present an interactive web-based debugger built on these principles. This tool embodies the formal techniques, visualizing the operational state (environments, stack) and using the static CFG to animate control flow directly on the graph during step-by-step execution. SimpliPy thus integrates formal semantics, program analysis, and visualization to offer both a pedagogical approach and a practical demonstration of applying formal methods to program understanding.2025-10-18T17:48:25Z15 pages, 1 figure, 1 table. Accepted at the 4th Workshop on Research Highlights in Programming Languages (RHPL 2025), co-located with FSTTCS 2025. Code available at: https://github.com/PraneethJain/simplipyMoida Praneeth JainVenkatesh Choppellahttp://arxiv.org/abs/2510.17889v1Hey Pentti, We Did It!: A Fully Vector-Symbolic Lisp2025-10-18T14:42:36ZKanerva (2014) suggested that it would be possible to construct a complete Lisp out of a vector-symbolic architecture. We present the general form of a vector-symbolic representation of the five Lisp elementary functions, lambda expressions, and other auxiliary functions, found in the Lisp 1.5 specification McCarthy (1960), which is near minimal and sufficient for Turing-completeness. Our specific implementation uses holographic reduced representations Plate (1995), with a lookup table cleanup memory. Lisp, as all Turing-complete languages, is a Cartesian closed category, unusual in its proximity to the mathematical abstraction. We discuss the mathematics, the purpose, and the significance of demonstrating vector-symbolic architectures' Cartesian-closure, as well as the importance of explicitly including cleanup memories in the specification of the architecture.2025-10-18T14:42:36ZTomkins Flanagan, E., & Kelly, M. A. (2024, July). Hey Pentti, We Did It!: A Fully Vector-Symbolic Lisp. Abstract published at MathPsych / ICCM 2024. Via mathpsych.org/presentation/1541Eilene Tomkins-FlanaganDepartment of Cognitive Science, Carleton UniversityMary A. KellyDepartment of Cognitive Science, Carleton Universityhttp://arxiv.org/abs/2510.16357v1MLCPD: A Unified Multi-Language Code Parsing Dataset with Universal AST Schema2025-10-18T05:31:14ZWe introduce the MultiLang Code Parser Dataset (MLCPD), a large-scale, language-agnostic dataset unifying syntactic and structural representations of code across ten major programming languages. MLCPD contains over seven million parsed source files normalized under our proposed universal Abstract Syntax Tree (AST) schema, enabling consistent cross-language reasoning, structural learning, and multilingual software analysis. Unlike existing corpora that focus purely on token-level code or isolated parsers, MLCPD provides both hierarchical tree representations and rich metadata for every file, ensuring lossless syntactic coverage and structural uniformity. Each entry includes a normalized schema, language-level metadata, and abstracted node semantics stored in Parquet format for scalable retrieval. Empirical analyses reveal strong cross-language structural regularities-demonstrating that syntactic graphs from languages as diverse as Python, Java, and Go can be aligned under a shared schema. We release the dataset publicly on Hugging Face and the accompanying codebase on GitHub, which includes complete pipelines for dataset reproduction, grammar compilation, and a visualization tool for exploring the unified AST across languages. Together, these resources establish MLCPD as an open, reproducible foundation for future research in cross-language representation learning and program analysis.2025-10-18T05:31:14Z12 pages, 7 figures, 4 tables, 2 algorithms, and 34 references. HuggingFace: https://huggingface.co/datasets/jugalgajjar/MultiLang-Code-Parser-Dataset GitHub: https://github.com/JugalGajjar/MultiLang-Code-Parser-DatasetJugal GajjarKamalasankari Subramaniakuppusamyhttp://arxiv.org/abs/2504.05482v2Imperative vs. Declarative Programming Paradigms for Open-Universe Scene Generation2025-10-17T15:41:29ZCurrent methods for generating 3D scene layouts from text predominantly follow a declarative paradigm, where a Large Language Model (LLM) specifies high-level constraints that are then resolved by a separate solver. This paper challenges that consensus by introducing a more direct, imperative approach. We task an LLM with generating a step-by-step program that iteratively places each object relative to those already in the scene. This paradigm simplifies the underlying scene specification language, enabling the creation of more complex, varied, and highly structured layouts that are difficult to express declaratively. To improve the robustness, we complement our method with a novel, LLM-free error correction mechanism that operates directly on the generated code, iteratively adjusting parameters within the program to resolve collisions and other inconsistencies. In forced-choice perceptual studies, human participants overwhelmingly preferred our imperative layouts, choosing them over those from two state-of-the-art declarative systems 82% and 94% of the time, demonstrating the significant potential of this alternative paradigm. Finally, we present a simple automated evaluation metric for 3D scene layout generation that correlates strongly with human judgment.2025-04-07T20:24:24ZMaxim GuminDo Heon HanSeung Jean YooAditya GaneshanR. Kenny JonesRio Aguina-KangStewart MorrisDaniel Ritchiehttp://arxiv.org/abs/2510.15700v1ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations2025-10-17T14:45:30ZNeural theorem proving has advanced rapidly in the past year, reaching IMO gold-medalist capabilities and producing formal proofs that span thousands of lines. Although such proofs are mechanically verified by formal systems like Lean, their excessive length renders them difficult for humans to comprehend and limits their usefulness for mathematical insight. Proof simplification is therefore a critical bottleneck. Yet, training data for this task is scarce, and existing methods -- mainly agentic scaffolding with off-the-shelf LLMs -- struggle with the extremely long proofs generated by RL-trained provers. We introduce ProofOptimizer, the first language model trained to simplify Lean proofs without requiring additional human supervision. ProofOptimizer is trained via expert iteration and reinforcement learning, using Lean to verify simplifications and provide training signal. At inference time, it operates within an iterative proof-shortening workflow, progressively reducing proof length. Experiments show that ProofOptimizer substantially compresses proofs generated by state-of-the-art RL-trained provers on standard benchmarks, reducing proof length by 87% on miniF2F, 57% on PutnamBench, and 49% on Seed-Prover's IMO 2025 proofs. Beyond conciseness, the simplified proofs check faster in Lean and further improve downstream prover performance when reused as training data for supervised finetuning.2025-10-17T14:45:30Z52 pages, 16 figures, website: http://proof-optimizer.github.io/Alex GuBartosz PiotrowskiFabian GloeckleKaiyu YangAram H. Markosyanhttp://arxiv.org/abs/2510.15178v1Visualizing miniKanren Search with a Fine-Grained Small-Step Semantics2025-10-16T22:34:36ZWe present a deterministic small-step operational semantics for miniKanren that explicitly represents the evolving search tree during execution. This semantics models interleaving and goal scheduling at fine granularity, allowing each evaluation step-goal activation, suspension, resumption, and success -- to be visualized precisely. Building on this model, we implement an interactive visualizer that renders the search tree as it develops and lets users step through execution. The tool acts as a pedagogical notional machine for reasoning about miniKanren's fair search behavior, helping users understand surprising answer orders and operational effects. Our semantics and tool are validated through property-based testing and illustrated with several examples.2025-10-16T22:34:36Z2025 miniKanren WorkshopBrysen PfingstenJason Hemannhttp://arxiv.org/abs/2510.14972v1TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar2025-10-16T17:59:45ZLarge language models (LLMs) for code rely on subword tokenizers, such as byte-pair encoding (BPE), learned from mixed natural language text and programming language code but driven by statistics rather than grammar. As a result, semantically identical code snippets can be tokenized differently depending on superficial factors such as whitespace or identifier naming. To measure the impact of this misalignment, we introduce TokDrift, a framework that applies semantic-preserving rewrite rules to create code variants differing only in tokenization. Across nine code LLMs, including large ones with over 30B parameters, even minor formatting changes can cause substantial shifts in model behavior. Layer-wise analysis shows that the issue originates in early embeddings, where subword segmentation fails to capture grammar token boundaries. Our findings identify misaligned tokenization as a hidden obstacle to reliable code understanding and generation, highlighting the need for grammar-aware tokenization for future code LLMs.2025-10-16T17:59:45ZYinxi LiYuntian DengPengyu Nie