https://arxiv.org/api/wVkwpDOL1TlSLb3YxDavhtzaAFs 2026-06-21T17:00:41Z 9918 765 15 http://arxiv.org/abs/2508.11665v2 StackPilot: Autonomous Function Agents for Scalable and Environment-Free Code Execution 2026-01-13T05:50:35Z

Recent advances in large language models (LLMs) have substantially enhanced automated code generation across a wide range of programming languages. Nonetheless, verifying the correctness and executability of LLM-generated code remains a significant challenge, as traditional methods rely on language-specific compilers and environment-dependent runtimes. To overcome these limitations, we introduce StackPilot, an LLM-native, multi-agent framework designed for language-agnostic code verification and execution, which operates independently of conventional toolchains. StackPilot offers three principal innovations: (1) a Function-as-Agents paradigm, in which each function is modeled as an autonomous agent capable of fine-grained reasoning and collaborative verification; (2) an LLM-as-Executor strategy, which enables scalable verification via stack-based scheduling; and (3) a novel snapshot mechanism that preserves complete execution contexts, facilitating deterministic and lossless context switching during verification. Empirical evaluations demonstrate that StackPilot achieves framework reliability rates between 89% and 97%, substantially outperforming baseline approaches. These results indicate that StackPilot can reliably verify and execute a significantly larger proportion of LLM-generated code across diverse programming tasks compared to existing methods.

2025-08-06T10:55:00Z This method needs to be reconsidered and there is something wrong with experiment Xinkui Zhao Yifan Zhang Zhengyi Zhou Yueshen Xu http://arxiv.org/abs/2601.08071v1 S4 modal sequent calculus as intermediate logic and intermediate language 2026-01-12T23:30:21Z

In this short paper, we advocate for the idea that continuation-based intermediate languages correspond to intermediate logics. The goal of intermediate languages is to serve as a basis for compiler intermediate representations, allowing to represent expressive program transformations for optimisation and compilation, while preserving the properties that make programs compilable efficiently in the first place, such as the "stackability" of continuations. Intermediate logics are logics between intuitionistic and classical logic in terms of provability. Second-class continuations used in CPS-based intermediate languages correspond to a classical modal logic S4 with the added restriction that implications may only return modal types. This indeed corresponds to an intermediate logic, owing to the Gödel-McKinsey-Tarski theorem which states the intuitionistic nature of the modal fragment of S4. We introduce a three-kinded polarised sequent calculus for S4, together with an operational machine model that separates a heap from a stack. With this model we study a stackability property for the modal fragment of S4.

2026-01-12T23:30:21Z 14 pages, paper accepted for PEPM 2026 in the "short paper" category Jean Caspar Guillaume Munch-Maccagnoni http://arxiv.org/abs/2601.06719v1 FO-Complete Program Verification for Heap Logics 2026-01-10T23:39:28Z

We develop the first two heap logics that have implicit heaplets and that admit FO-complete program verification. The notion of FO-completeness is a theoretical guarantee that all theorems that are valid when recursive definitions are interpreted as fixpoint definitions (instead of least fixpoint) are guaranteed to be eventually proven by the system. The logics we develop are a frame logic ($\textit{FL}$) and a separation logic ($\textit{SL-FL}$) that has an alternate semantics inspired by frame logic. We show verification condition generation for FL that is amenable to FO-complete reasoning using quantifier instantiation and SMT solvers. We show $\textit{SL-FL}$ can be translated to FL in order to obtain FO-complete reasoning. We implement tools that realize our technique and show the expressiveness of our logics and the efficacy of the verification technique on a suite of benchmarks that manipulate data structures.

2026-01-10T23:39:28Z Appeared in OOPSLA '25 Proc. ACM Program. Lang. 9, OOPSLA1, Article 106, 2025 Adithya Murali Hrishikesh Balakrishnan Aaron Councilman P. Madhusudan 10.1145/3720447 http://arxiv.org/abs/2601.06419v1 Lightweight Yet Secure: Secure Scripting Language Generation via Lightweight LLMs 2026-01-10T04:00:56Z

The security of scripting languages such as PowerShell is critical given their powerful automation and administration capabilities, often exercised with elevated privileges. Today, securing these languages still demands substantial human effort to craft and enforce rules, imposing heavy burdens on typical administrators and creating critical production risks (e.g., misoperations that shut down servers).Large language models (LLMs) have demonstrated strong capabilities in code generation, vulnerability detection, and automated repair for languages like Python and JavaScript. However, their ability to assist with generating secure scripting-language code remains largely underexplored. In this paper, we present SecGenEval-PS, a benchmark designed to systematically evaluate LLMs on secure scripting generation, security analysis, and automated repair. Our results show that both proprietary and open-source models fall short in these areas. For instance, over 60% of PowerShell scripts produced by GPT-4o and o3-mini are insecure without structured guidance.To bridge this gap, we propose PSSec, a framework that combines data synthesis with fine-tuning to enhance model security capabilities. We develop a self-debugging agent that integrates static analyzers with the reasoning abilities of advanced LLMs to synthesize large-scale structured triplets of insecure scripts, violation analyses, and corresponding repairs. We then fine-tune lightweight LLMs (as small as 1.7B parameters) using supervised fine-tuning (SFT) and reinforcement learning (RL), enabling security-aware reasoning and the generation of secure PowerShell code.Across multiple LLM families, including GPT and Qwen, \textit{PSSec}-trained models match or surpass general-purpose large models on PowerShell security tasks while reducing inference cost by more than an order of magnitude.

2026-01-10T04:00:56Z 19 pages,8 figures,conference Keyang Zhang Zeyu Chen Xuan Feng Dongliang Fang Yaowen Zheng Zhi Li Limin Sun http://arxiv.org/abs/2501.15156v2 Quantifier Elimination and Craig Interpolation, Quantitatively 2026-01-09T21:27:45Z

Quantifier elimination (QE) and Craig interpolation (CI) are central to various state-of-the-art automated approaches to hardware and software verification. They are rooted in the Boolean setting and are successful for, e.g., first-order theories such as linear rational arithmetic. What about their applicability in the quantitative setting where formulae evaluate to numbers and quantitative supremum/infimum quantifiers are the natural counterparts of Boolean quantifiers? Applications include establishing quantitative properties of programs, such as bounds on expected outcomes of probabilistic programs featuring nondeterminism, and analyzing the flow of information through programs. In this paper, we present, to the best of our knowledge, the first QE algorithm for possibly unbounded, $\infty$- or $-\infty$-valued, or discontinuous piecewise linear quantities. They are the quantitative counterpart to linear rational arithmetic, and they are a popular quantitative assertion language for probabilistic program verification. We provide rigorous soundness proofs as well as upper space complexity bounds. Moreover, we present two applications of our QE algorithm. First, our algorithm yields a quantitative CI theorem: given arbitrary piecewise linear quantities $f$ and $g$ with $f \models g$, both the strongest and the weakest Craig interpolant of $f$ and $g$ are quantifier-free and effectively constructible. Second, we apply our QE algorithm to compute minimal and maximal expected outcomes of loop-free probabilistic programs featuring unbounded nondeterminism.

2025-01-25T09:38:50Z Kevin Batz Joost-Pieter Katoen Nora Orhan http://arxiv.org/abs/2601.05972v1 Categorical Foundations for CuTe Layouts 2026-01-09T17:42:46Z

NVIDIA's CUTLASS library provides a robust and expressive set of methods for describing and manipulating multi-dimensional tensor data on the GPU. These methods are conceptually grounded in the abstract notion of a CuTe layout and a rich algebra of such layouts, including operations such as composition, logical product, and logical division. In this paper, we present a categorical framework for understanding this layout algebra by focusing on a naturally occurring class of tractable layouts. To this end, we define two categories Tuple and Nest whose morphisms give rise to layouts. We define a suite of operations on morphisms in these categories and prove their compatibility with the corresponding layout operations. Moreover, we give a complete characterization of the layouts which arise from our construction. Finally, we provide a Python implementation of our categorical constructions, along with tests that demonstrate alignment with CUTLASS behavior. This implementation can be found at our git repository https://github.com/ColfaxResearch/layout-categories.

2026-01-09T17:42:46Z 174 pages Jack Carlisle Jay Shah Reuben Stern Paul VanKoughnett http://arxiv.org/abs/2510.18418v2 A Lazy, Concurrent Convertibility Checker 2026-01-09T08:58:00Z

Convertibility checking - determining whether two lambda-terms are equal up to reductions - is a crucial component of proof assistants and dependently-typed languages. Practical implementations often use heuristics to quickly conclude that two terms are or are not convertible without reducing them to normal form. However, these heuristics can backfire, triggering huge amounts of unnecessary computation. This paper presents a novel convertibility-checking algorithm that relies crucially on laziness and concurrency} Laziness is used to share computations, while concurrency is used to explore multiple convertibility subproblems in parallel or via fair interleaving. Unlike heuristics-based approaches, our algorithm always finds an easy solution to the convertibility problem, if one exists. The paper presents the algorithm in process calculus style and discusses its mechanized proof of partial correctness, its complexity, and its lightweight experimental evaluation.

2025-10-21T08:50:12Z Proceedings of the ACM on Programming Languages, 2026, 10 (POPL), pp.53:1-53:27 Nathanaëlle Courant CAMBIUM Xavier Leroy CAMBIUM http://arxiv.org/abs/2510.16133v2 Typing Strictness (Extended Version) 2026-01-08T23:56:30Z

Strictness analysis is critical to efficient implementation of languages with non-strict evaluation, mitigating much of the performance overhead of laziness. However, reasoning about strictness at the source level can be challenging and unintuitive. We propose a new definition of strictness that refines the traditional one by describing variable usage more precisely. We lay type-theoretic foundations for this definition in both call-by-name and call-by-push-value settings, drawing inspiration from the literature on type systems tracking effects and coeffects. We prove via a logical relation that the strictness attributes computed by our type systems accurately describe the use of variables at runtime, and we offer a strictness-annotation-preserving translation from the call-by-name system to the call-by-push-value one. All our results are mechanized in Rocq.

2025-10-17T18:19:27Z 30 pages, 22 figures, extended version of a paper at POPL 2026 Daniel Sainati Joseph W. Cutler Benjamin C. Pierce Stephanie Weirich http://arxiv.org/abs/2412.06994v5 Phaedrus: Predicting Dynamic Application Behavior with Lightweight Generative Models and LLMs 2026-01-08T22:57:49Z

Application profiling is essential for software optimization tasks such as code layout and memory placement, where optimization decisions depend on program behavior. However, modern applications exhibit significant input-dependent variability, limiting the effectiveness of conventional profiling approaches that rely on a single representative execution. We present Phaedrus, a compiler-assisted deep learning framework that predicts dynamic program behavior across diverse execution instances, with a focus on dynamic function call prediction. These predicted call sequences are used to guide input-specific compiler optimizations, enabling code specialization without requiring program execution. Phaedrus introduces two complementary techniques. Application Behavior Synthesis (Dynamis) is a profile-less approach in which large language models infer dynamic behavior directly from source code and static compiler analysis, bypassing traditional profiling. Application Profile Generalization (Morpheus) employs generative models trained on compressed and augmented Whole Program Path (WPP) function profiles to predict application behavior for unseen inputs. Experimental results show that Phaedrus accurately identifies frequently executed and runtime-dominated hotspot functions, covering up to 85-99% of total execution time. Using these predictions, Phaedrus enables superior profile-guided optimizations, achieving an average performance improvement of 6% (upto 25%) and a binary size reduction of 5.19% (upto 19%), without executing the target program. Additionally, Phaedrus reduces WPP function profile sizes by up to $10^{7} \times $.

2024-12-09T21:01:45Z OOPSLA 2026 Bodhisatwa Chatterjee Neeraj Jadhav Santosh Pande http://arxiv.org/abs/2601.05012v1 The Squirrel Parser: A Linear-Time PEG Packrat Parser Capable of Left Recursion and Optimal Error Recovery 2026-01-08T15:13:27Z

We present the squirrel parser, a PEG packrat parser that directly handles all forms of left recursion with optimal error recovery, while maintaining linear time complexity in the length of the input even in the presence of an arbitrary number of errors. Traditional approaches to handling left recursion in a recursive descent parser require grammar rewriting or complex algorithmic extensions. We derive a minimal algorithm from first principles: cycle detection via per-position state tracking and $O(1)$-per-LR-cycle communication from descendant to ancestor recursion frames, and fixed-point search via iterative expansion. For error recovery, we derived a set of four axioms and twelve constraints that must be imposed upon an optimal error recovery design to ensure completeness, correctness, optimality of performance, and intuitiveness of behavior. We utilized a constraint satisfaction mechanism to search the space of all possibilities, arriving at a provably optimal and robust error recovery strategy that maintains perfect performance linearity.

2026-01-08T15:13:27Z Luke A. D. Hutchison http://arxiv.org/abs/2310.14611v3 Predictive Monitoring against Pattern Regular Languages 2026-01-08T08:04:20Z

In this paper, we focus on the problem of dynamically analysing concurrent software against high-level temporal specifications. Existing techniques for runtime monitoring against such specifications are primarily designed for sequential software and remain inadequate in the presence of concurrency -- violations may be observed only in intricate thread interleavings, requiring many re-runs of the underlying software. Towards this, we study the problem of predictive runtime monitoring, inspired by the analogous problem of predictive data race detection studied extensively recently. The predictive runtime monitoring question asks, given an execution $σ$, if it can be soundly reordered to expose violations of a specification. In this paper, we focus on specifications that are given in regular languages. Our notion of reorderings is trace equivalence, where an execution is considered a reordering of another if it can be obtained from the latter by successively commuting adjacent independent actions. We first show that the problem of predictive admits a super-linear lower bound of $O(n^α)$, where $n$ is the number of events in the execution, and $α$ is a parameter describing the degree of commutativity. As a result, predictive runtime monitoring even in this setting is unlikely to be efficiently solvable. Towards this, we identify a sub-class of regular languages, called pattern languages (and their extension generalized pattern languages). Pattern languages can naturally express specific ordering of some number of (labelled) events, and have been inspired by popular empirical hypotheses, the `small bug depth' hypothesis. More importantly, we show that for pattern (and generalized pattern) languages, the predictive monitoring problem can be solved using a constant-space streaming linear-time algorithm.

2023-10-23T06:41:55Z The proof of Theorem 3.2 has been revised. Results unchanged Zhendong Ang Umang Mathur 10.1145/3632915 http://arxiv.org/abs/2601.04573v1 Lenses for Partially-Specified States (Extended Version) 2026-01-08T04:03:55Z

A bidirectional transformation is a pair of transformations satisfying certain well-behavedness properties: one maps source data into view data, and the other translates changes on the view back to the source. However, when multiple views share a source, an update on one view may affect the others, making it hard to maintain correspondence while preserving the user's update, especially when multiple views are changed at once. Ensuring these properties within a compositional framework is even more challenging. In this paper, we propose partial-state lenses, which allow source and view states to be partially specified to precisely represent the user's update intentions. These intentions are partially ordered, providing clear semantics for merging intentions of updates coming from multiple views and a refined notion of update preservation compatible with this merging. We formalize partial-state lenses, together with partial-specifiedness-aware well-behavedness that supports compositional reasoning and ensures update preservation. In addition, we demonstrate the utility of the proposed system through examples.

2026-01-08T04:03:55Z Extended version of our paper to appear in ESOP 2026 Kazutaka Matsuda Minh Nguyen Meng Wang http://arxiv.org/abs/2601.04523v1 Sharded Elimination and Combining for Highly-Efficient Concurrent Stacks 2026-01-08T02:42:48Z

We present a new blocking linearizable stack implementation which utilizes sharding and fetch&increment to achieve significantly better performance than all existing concurrent stacks. The proposed implementation is based on a novel elimination mechanism and a new combining approach that are efficiently blended to gain high performance. Our implementation results in enhanced parallelism and low contention when accessing the shared stack. Experiments show that the proposed stack implementation outperforms all existing concurrent stacks by up to 2X in most workloads. It is particularly efficient in systems supporting a large number of threads and in high contention scenarios.

2026-01-08T02:42:48Z extended version of paper in PPoPP 2026 Ajay Singh Nikos Metaxakis Panagiota Fatourou http://arxiv.org/abs/2601.04492v1 Scalable Floating-Point Satisfiability via Staged Optimization 2026-01-08T01:51:46Z

This work introduces StageSAT, a new approach to solving floating-point satisfiability that bridges SMT solving with numerical optimization. StageSAT reframes a floating-point formula as a series of optimization problems in three stages of increasing precision. It begins with a fast, projection-aided descent objective to guide the search toward a feasible region, proceeding to bit-level accuracy with ULP$^2$ optimization and a final $n$-ULP lattice refinement. By construction, the final stage uses a representing function that is zero if and only if a candidate satisfies all constraints. Thus, when optimization drives the objective to zero, the resulting assignment is a valid solution, providing a built-in guarantee of soundness. To improve search, StageSAT introduces a partial monotone descent property on linear constraints via orthogonal projection, preventing the optimizer from stalling on flat or misleading landscapes. Critically, this solver requires no heavy bit-level reasoning or specialized abstractions; it treats complex arithmetic as a black-box, using runtime evaluations to navigate the input space. We implement StageSAT and evaluate it on extensive benchmarks, including SMT-COMP'25 suites and difficult cases from prior work. StageSAT proved more scalable and accurate than state-of-the-art optimization-based alternatives. It solved strictly more formulas than any competing solver under the same time budget, finding most satisfiable instances without producing spurious models. This amounts to 99.4% recall on satisfiable cases with 0% false SAT, exceeding the reliability of prior optimization-based solvers. StageSAT also delivered significant speedups (often 5--10$\times$) over traditional bit-precise SMT and numeric solvers. These results demonstrate that staged optimization significantly improves performance and correctness of floating-point satisfiability solving.

2026-01-08T01:51:46Z Yuanzhuo Zhang Zhoulai Fu Binoy Ravindran http://arxiv.org/abs/2401.01114v2 Static Deadlock Detection for Rust Programs 2026-01-07T13:33:39Z

Rust relies on its unique ownership mechanism to ensure thread and memory safety. However, numerous potential security vulnerabilities persist in practical applications. New language features in Rust pose new challenges for vulnerability detection. This paper proposes a static deadlock detection method tailored for Rust programs, aiming to identify various deadlock types, including double lock, conflict lock, and deadlock associated with conditional variables. With due consideration for Rust's ownership and lifetimes, we first complete the pointer analysis. Then, based on the obtained points-to information, we analyze dependencies among variables to identify potential deadlocks. We develop a tool and conduct experiments based on the proposed method. The experimental results demonstrate that our method outperforms existing deadlock detection methods in precision.

2024-01-02T09:09:48Z Yu Zhang Kaiwen Zhang Guanjun Liu