https://arxiv.org/api/wVkwpDOL1TlSLb3YxDavhtzaAFs2026-06-21T17:00:41Z991876515http://arxiv.org/abs/2508.11665v2StackPilot: Autonomous Function Agents for Scalable and Environment-Free Code Execution2026-01-13T05:50:35ZRecent advances in large language models (LLMs) have substantially enhanced automated code generation across a wide range of programming languages. Nonetheless, verifying the correctness and executability of LLM-generated code remains a significant challenge, as traditional methods rely on language-specific compilers and environment-dependent runtimes. To overcome these limitations, we introduce StackPilot, an LLM-native, multi-agent framework designed for language-agnostic code verification and execution, which operates independently of conventional toolchains. StackPilot offers three principal innovations: (1) a Function-as-Agents paradigm, in which each function is modeled as an autonomous agent capable of fine-grained reasoning and collaborative verification; (2) an LLM-as-Executor strategy, which enables scalable verification via stack-based scheduling; and (3) a novel snapshot mechanism that preserves complete execution contexts, facilitating deterministic and lossless context switching during verification. Empirical evaluations demonstrate that StackPilot achieves framework reliability rates between 89% and 97%, substantially outperforming baseline approaches. These results indicate that StackPilot can reliably verify and execute a significantly larger proportion of LLM-generated code across diverse programming tasks compared to existing methods.2025-08-06T10:55:00ZThis method needs to be reconsidered and there is something wrong with experimentXinkui ZhaoYifan ZhangZhengyi ZhouYueshen Xuhttp://arxiv.org/abs/2601.08071v1S4 modal sequent calculus as intermediate logic and intermediate language2026-01-12T23:30:21ZIn this short paper, we advocate for the idea that continuation-based intermediate languages correspond to intermediate logics. The goal of intermediate languages is to serve as a basis for compiler intermediate representations, allowing to represent expressive program transformations for optimisation and compilation, while preserving the properties that make programs compilable efficiently in the first place, such as the "stackability" of continuations. Intermediate logics are logics between intuitionistic and classical logic in terms of provability. Second-class continuations used in CPS-based intermediate languages correspond to a classical modal logic S4 with the added restriction that implications may only return modal types. This indeed corresponds to an intermediate logic, owing to the Gödel-McKinsey-Tarski theorem which states the intuitionistic nature of the modal fragment of S4. We introduce a three-kinded polarised sequent calculus for S4, together with an operational machine model that separates a heap from a stack. With this model we study a stackability property for the modal fragment of S4.2026-01-12T23:30:21Z14 pages, paper accepted for PEPM 2026 in the "short paper" categoryJean CasparGuillaume Munch-Maccagnonihttp://arxiv.org/abs/2601.06719v1FO-Complete Program Verification for Heap Logics2026-01-10T23:39:28ZWe develop the first two heap logics that have implicit heaplets and that admit FO-complete program verification. The notion of FO-completeness is a theoretical guarantee that all theorems that are valid when recursive definitions are interpreted as fixpoint definitions (instead of least fixpoint) are guaranteed to be eventually proven by the system. The logics we develop are a frame logic ($\textit{FL}$) and a separation logic ($\textit{SL-FL}$) that has an alternate semantics inspired by frame logic. We show verification condition generation for FL that is amenable to FO-complete reasoning using quantifier instantiation and SMT solvers. We show $\textit{SL-FL}$ can be translated to FL in order to obtain FO-complete reasoning. We implement tools that realize our technique and show the expressiveness of our logics and the efficacy of the verification technique on a suite of benchmarks that manipulate data structures.2026-01-10T23:39:28ZAppeared in OOPSLA '25Proc. ACM Program. Lang. 9, OOPSLA1, Article 106, 2025Adithya MuraliHrishikesh BalakrishnanAaron CouncilmanP. Madhusudan10.1145/3720447http://arxiv.org/abs/2601.06419v1Lightweight Yet Secure: Secure Scripting Language Generation via Lightweight LLMs2026-01-10T04:00:56ZThe security of scripting languages such as PowerShell is critical given their powerful automation and administration capabilities, often exercised with elevated privileges. Today, securing these languages still demands substantial human effort to craft and enforce rules, imposing heavy burdens on typical administrators and creating critical production risks (e.g., misoperations that shut down servers).Large language models (LLMs) have demonstrated strong capabilities in code generation, vulnerability detection, and automated repair for languages like Python and JavaScript. However, their ability to assist with generating secure scripting-language code remains largely underexplored. In this paper, we present SecGenEval-PS, a benchmark designed to systematically evaluate LLMs on secure scripting generation, security analysis, and automated repair. Our results show that both proprietary and open-source models fall short in these areas. For instance, over 60% of PowerShell scripts produced by GPT-4o and o3-mini are insecure without structured guidance.To bridge this gap, we propose PSSec, a framework that combines data synthesis with fine-tuning to enhance model security capabilities. We develop a self-debugging agent that integrates static analyzers with the reasoning abilities of advanced LLMs to synthesize large-scale structured triplets of insecure scripts, violation analyses, and corresponding repairs. We then fine-tune lightweight LLMs (as small as 1.7B parameters) using supervised fine-tuning (SFT) and reinforcement learning (RL), enabling security-aware reasoning and the generation of secure PowerShell code.Across multiple LLM families, including GPT and Qwen, \textit{PSSec}-trained models match or surpass general-purpose large models on PowerShell security tasks while reducing inference cost by more than an order of magnitude.2026-01-10T04:00:56Z19 pages,8 figures,conferenceKeyang ZhangZeyu ChenXuan FengDongliang FangYaowen ZhengZhi LiLimin Sunhttp://arxiv.org/abs/2501.15156v2Quantifier Elimination and Craig Interpolation, Quantitatively2026-01-09T21:27:45ZQuantifier elimination (QE) and Craig interpolation (CI) are central to various state-of-the-art automated approaches to hardware and software verification. They are rooted in the Boolean setting and are successful for, e.g., first-order theories such as linear rational arithmetic. What about their applicability in the quantitative setting where formulae evaluate to numbers and quantitative supremum/infimum quantifiers are the natural counterparts of Boolean quantifiers? Applications include establishing quantitative properties of programs, such as bounds on expected outcomes of probabilistic programs featuring nondeterminism, and analyzing the flow of information through programs.
In this paper, we present, to the best of our knowledge, the first QE algorithm for possibly unbounded, $\infty$- or $-\infty$-valued, or discontinuous piecewise linear quantities. They are the quantitative counterpart to linear rational arithmetic, and they are a popular quantitative assertion language for probabilistic program verification. We provide rigorous soundness proofs as well as upper space complexity bounds. Moreover, we present two applications of our QE algorithm. First, our algorithm yields a quantitative CI theorem: given arbitrary piecewise linear quantities $f$ and $g$ with $f \models g$, both the strongest and the weakest Craig interpolant of $f$ and $g$ are quantifier-free and effectively constructible. Second, we apply our QE algorithm to compute minimal and maximal expected outcomes of loop-free probabilistic programs featuring unbounded nondeterminism.2025-01-25T09:38:50ZKevin BatzJoost-Pieter KatoenNora Orhanhttp://arxiv.org/abs/2601.05972v1Categorical Foundations for CuTe Layouts2026-01-09T17:42:46ZNVIDIA's CUTLASS library provides a robust and expressive set of methods for describing and manipulating multi-dimensional tensor data on the GPU. These methods are conceptually grounded in the abstract notion of a CuTe layout and a rich algebra of such layouts, including operations such as composition, logical product, and logical division. In this paper, we present a categorical framework for understanding this layout algebra by focusing on a naturally occurring class of tractable layouts. To this end, we define two categories Tuple and Nest whose morphisms give rise to layouts. We define a suite of operations on morphisms in these categories and prove their compatibility with the corresponding layout operations. Moreover, we give a complete characterization of the layouts which arise from our construction. Finally, we provide a Python implementation of our categorical constructions, along with tests that demonstrate alignment with CUTLASS behavior. This implementation can be found at our git repository https://github.com/ColfaxResearch/layout-categories.2026-01-09T17:42:46Z174 pagesJack CarlisleJay ShahReuben SternPaul VanKoughnetthttp://arxiv.org/abs/2510.18418v2A Lazy, Concurrent Convertibility Checker2026-01-09T08:58:00ZConvertibility checking - determining whether two lambda-terms are equal up to reductions - is a crucial component of proof assistants and dependently-typed languages. Practical implementations often use heuristics to quickly conclude that two terms are or are not convertible without reducing them to normal form. However, these heuristics can backfire, triggering huge amounts of unnecessary computation. This paper presents a novel convertibility-checking algorithm that relies crucially on laziness and concurrency} Laziness is used to share computations, while concurrency is used to explore multiple convertibility subproblems in parallel or via fair interleaving. Unlike heuristics-based approaches, our algorithm always finds an easy solution to the convertibility problem, if one exists. The paper presents the algorithm in process calculus style and discusses its mechanized proof of partial correctness, its complexity, and its lightweight experimental evaluation.2025-10-21T08:50:12ZProceedings of the ACM on Programming Languages, 2026, 10 (POPL), pp.53:1-53:27Nathanaëlle CourantCAMBIUMXavier LeroyCAMBIUMhttp://arxiv.org/abs/2510.16133v2Typing Strictness (Extended Version)2026-01-08T23:56:30ZStrictness analysis is critical to efficient implementation of languages with non-strict evaluation, mitigating much of the performance overhead of laziness. However, reasoning about strictness at the source level can be challenging and unintuitive. We propose a new definition of strictness that refines the traditional one by describing variable usage more precisely. We lay type-theoretic foundations for this definition in both call-by-name and call-by-push-value settings, drawing inspiration from the literature on type systems tracking effects and coeffects. We prove via a logical relation that the strictness attributes computed by our type systems accurately describe the use of variables at runtime, and we offer a strictness-annotation-preserving translation from the call-by-name system to the call-by-push-value one. All our results are mechanized in Rocq.2025-10-17T18:19:27Z30 pages, 22 figures, extended version of a paper at POPL 2026Daniel SainatiJoseph W. CutlerBenjamin C. PierceStephanie Weirichhttp://arxiv.org/abs/2412.06994v5Phaedrus: Predicting Dynamic Application Behavior with Lightweight Generative Models and LLMs2026-01-08T22:57:49ZApplication profiling is essential for software optimization tasks such as code layout and memory placement, where optimization decisions depend on program behavior. However, modern applications exhibit significant input-dependent variability, limiting the effectiveness of conventional profiling approaches that rely on a single representative execution. We present Phaedrus, a compiler-assisted deep learning framework that predicts dynamic program behavior across diverse execution instances, with a focus on dynamic function call prediction. These predicted call sequences are used to guide input-specific compiler optimizations, enabling code specialization without requiring program execution.
Phaedrus introduces two complementary techniques. Application Behavior Synthesis (Dynamis) is a profile-less approach in which large language models infer dynamic behavior directly from source code and static compiler analysis, bypassing traditional profiling. Application Profile Generalization (Morpheus) employs generative models trained on compressed and augmented Whole Program Path (WPP) function profiles to predict application behavior for unseen inputs. Experimental results show that Phaedrus accurately identifies frequently executed and runtime-dominated hotspot functions, covering up to 85-99% of total execution time. Using these predictions, Phaedrus enables superior profile-guided optimizations, achieving an average performance improvement of 6% (upto 25%) and a binary size reduction of 5.19% (upto 19%), without executing the target program. Additionally, Phaedrus reduces WPP function profile sizes by up to $10^{7} \times $.2024-12-09T21:01:45ZOOPSLA 2026Bodhisatwa ChatterjeeNeeraj JadhavSantosh Pandehttp://arxiv.org/abs/2601.05012v1The Squirrel Parser: A Linear-Time PEG Packrat Parser Capable of Left Recursion and Optimal Error Recovery2026-01-08T15:13:27ZWe present the squirrel parser, a PEG packrat parser that directly handles all forms of left recursion with optimal error recovery, while maintaining linear time complexity in the length of the input even in the presence of an arbitrary number of errors. Traditional approaches to handling left recursion in a recursive descent parser require grammar rewriting or complex algorithmic extensions. We derive a minimal algorithm from first principles: cycle detection via per-position state tracking and $O(1)$-per-LR-cycle communication from descendant to ancestor recursion frames, and fixed-point search via iterative expansion. For error recovery, we derived a set of four axioms and twelve constraints that must be imposed upon an optimal error recovery design to ensure completeness, correctness, optimality of performance, and intuitiveness of behavior. We utilized a constraint satisfaction mechanism to search the space of all possibilities, arriving at a provably optimal and robust error recovery strategy that maintains perfect performance linearity.2026-01-08T15:13:27ZLuke A. D. Hutchisonhttp://arxiv.org/abs/2310.14611v3Predictive Monitoring against Pattern Regular Languages2026-01-08T08:04:20ZIn this paper, we focus on the problem of dynamically analysing concurrent software against high-level temporal specifications. Existing techniques for runtime monitoring against such specifications are primarily designed for sequential software and remain inadequate in the presence of concurrency -- violations may be observed only in intricate thread interleavings, requiring many re-runs of the underlying software. Towards this, we study the problem of predictive runtime monitoring, inspired by the analogous problem of predictive data race detection studied extensively recently. The predictive runtime monitoring question asks, given an execution $σ$, if it can be soundly reordered to expose violations of a specification.
In this paper, we focus on specifications that are given in regular languages. Our notion of reorderings is trace equivalence, where an execution is considered a reordering of another if it can be obtained from the latter by successively commuting adjacent independent actions. We first show that the problem of predictive admits a super-linear lower bound of $O(n^α)$, where $n$ is the number of events in the execution, and $α$ is a parameter describing the degree of commutativity. As a result, predictive runtime monitoring even in this setting is unlikely to be efficiently solvable.
Towards this, we identify a sub-class of regular languages, called pattern languages (and their extension generalized pattern languages). Pattern languages can naturally express specific ordering of some number of (labelled) events, and have been inspired by popular empirical hypotheses, the `small bug depth' hypothesis. More importantly, we show that for pattern (and generalized pattern) languages, the predictive monitoring problem can be solved using a constant-space streaming linear-time algorithm.2023-10-23T06:41:55ZThe proof of Theorem 3.2 has been revised. Results unchangedZhendong AngUmang Mathur10.1145/3632915http://arxiv.org/abs/2601.04573v1Lenses for Partially-Specified States (Extended Version)2026-01-08T04:03:55ZA bidirectional transformation is a pair of transformations satisfying certain well-behavedness properties: one maps source data into view data, and the other translates changes on the view back to the source. However, when multiple views share a source, an update on one view may affect the others, making it hard to maintain correspondence while preserving the user's update, especially when multiple views are changed at once. Ensuring these properties within a compositional framework is even more challenging. In this paper, we propose partial-state lenses, which allow source and view states to be partially specified to precisely represent the user's update intentions. These intentions are partially ordered, providing clear semantics for merging intentions of updates coming from multiple views and a refined notion of update preservation compatible with this merging. We formalize partial-state lenses, together with partial-specifiedness-aware well-behavedness that supports compositional reasoning and ensures update preservation. In addition, we demonstrate the utility of the proposed system through examples.2026-01-08T04:03:55ZExtended version of our paper to appear in ESOP 2026Kazutaka MatsudaMinh NguyenMeng Wanghttp://arxiv.org/abs/2601.04523v1Sharded Elimination and Combining for Highly-Efficient Concurrent Stacks2026-01-08T02:42:48ZWe present a new blocking linearizable stack implementation which utilizes sharding and fetch&increment to achieve significantly better performance than all existing concurrent stacks. The proposed implementation is based on a novel elimination mechanism and a new combining approach that are efficiently blended to gain high performance. Our implementation results in enhanced parallelism and low contention when accessing the shared stack. Experiments show that the proposed stack implementation outperforms all existing concurrent stacks by up to 2X in most workloads. It is particularly efficient in systems supporting a large number of threads and in high contention scenarios.2026-01-08T02:42:48Zextended version of paper in PPoPP 2026Ajay SinghNikos MetaxakisPanagiota Fatourouhttp://arxiv.org/abs/2601.04492v1Scalable Floating-Point Satisfiability via Staged Optimization2026-01-08T01:51:46ZThis work introduces StageSAT, a new approach to solving floating-point satisfiability that bridges SMT solving with numerical optimization. StageSAT reframes a floating-point formula as a series of optimization problems in three stages of increasing precision. It begins with a fast, projection-aided descent objective to guide the search toward a feasible region, proceeding to bit-level accuracy with ULP$^2$ optimization and a final $n$-ULP lattice refinement.
By construction, the final stage uses a representing function that is zero if and only if a candidate satisfies all constraints. Thus, when optimization drives the objective to zero, the resulting assignment is a valid solution, providing a built-in guarantee of soundness.
To improve search, StageSAT introduces a partial monotone descent property on linear constraints via orthogonal projection, preventing the optimizer from stalling on flat or misleading landscapes. Critically, this solver requires no heavy bit-level reasoning or specialized abstractions; it treats complex arithmetic as a black-box, using runtime evaluations to navigate the input space.
We implement StageSAT and evaluate it on extensive benchmarks, including SMT-COMP'25 suites and difficult cases from prior work. StageSAT proved more scalable and accurate than state-of-the-art optimization-based alternatives. It solved strictly more formulas than any competing solver under the same time budget, finding most satisfiable instances without producing spurious models. This amounts to 99.4% recall on satisfiable cases with 0% false SAT, exceeding the reliability of prior optimization-based solvers. StageSAT also delivered significant speedups (often 5--10$\times$) over traditional bit-precise SMT and numeric solvers. These results demonstrate that staged optimization significantly improves performance and correctness of floating-point satisfiability solving.2026-01-08T01:51:46ZYuanzhuo ZhangZhoulai FuBinoy Ravindranhttp://arxiv.org/abs/2401.01114v2Static Deadlock Detection for Rust Programs2026-01-07T13:33:39ZRust relies on its unique ownership mechanism to ensure thread and memory safety. However, numerous potential security vulnerabilities persist in practical applications. New language features in Rust pose new challenges for vulnerability detection. This paper proposes a static deadlock detection method tailored for Rust programs, aiming to identify various deadlock types, including double lock, conflict lock, and deadlock associated with conditional variables. With due consideration for Rust's ownership and lifetimes, we first complete the pointer analysis. Then, based on the obtained points-to information, we analyze dependencies among variables to identify potential deadlocks. We develop a tool and conduct experiments based on the proposed method. The experimental results demonstrate that our method outperforms existing deadlock detection methods in precision.2024-01-02T09:09:48ZYu ZhangKaiwen ZhangGuanjun Liu