https://arxiv.org/api/gOquLMZBwirPsc6/UHUy7BhOnKs2026-06-23T20:53:22Z9934111015http://arxiv.org/abs/2510.04994v1concurrentKanren: miniKanren for parallel execution2025-10-06T16:30:30ZConcurrent logic programming predates miniKanren, but concurrent implementations of miniKanren have remained largely unexplored. In this work we present a parallel implementation of miniKanren in Go, demonstrating its feasibility and potential for performance improvements. Our approach leverages implicit parallelism allowing legacy programs to benefit from parallel execution. We discuss implementation strategies and evaluate the impact of parallelism, laying groundwork for future language-agnostic models.2025-10-06T16:30:30Z13 pages, 1 figure, for associated repo see https://github.com/deosjr/concurrentKanrenSjoerd Dosthttp://arxiv.org/abs/2504.05424v4Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution2025-10-06T15:57:48ZEfficiency is essential to support ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations. Our key insight is that, while DL programs typically execute sequentially, hybridizing imperative DL code resembles parallelizing sequential code in traditional systems. Inspired by this, we present an automated refactoring approach that assists developers in determining which otherwise eagerly-executed imperative DL functions could be effectively and efficiently executed as graphs. The approach features novel static imperative tensor and side-effect analyses for Python. Due to its inherent dynamism, analyzing Python may be unsound; however, the conservative approach leverages a speculative (keyword-based) analysis for resolving difficult cases that informs developers of any assumptions made. The approach is: (i) implemented as a plug-in to the PyDev Eclipse IDE that integrates the WALA Ariadne analysis framework and (ii) evaluated on nineteen DL projects consisting of 132 KLOC. The results show that 326 of 766 candidate functions (42.56%) were refactorable, and an average relative speedup of 2.16x on performance tests was observed with negligible differences in model accuracy. The results indicate that the approach is useful in optimizing imperative DL code to its full potential.2025-04-07T18:48:43Z2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)Raffi KhatchadourianTatiana Castro VélezMehdi BagherzadehNan JiaAnita Rajahttp://arxiv.org/abs/2510.04890v1Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization2025-10-06T15:11:41ZModern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit available vectorization opportunities due to disjoint vectorization passes and limited extensibility. Although recent attempts in heuristics and intermediate representation (IR) designs have attempted to address these problems, efficiently simplifying control flow analysis and accurately identifying vectorization opportunities remain challenging tasks.
To address these issues, we introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents instruction dependencies through data dependency analysis. Leveraging the detailed dependency information provided by VIR, we develop a flexible and extensible vectorization framework. This approach substantially improves interoperability across vectorization passes and expands the search space for identifying isomorphic instructions, ultimately enhancing both the scope and efficiency of automatic vectorization. Experimental evaluations demonstrate that our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.2025-10-06T15:11:41ZShihan FangWenxin Zhenghttp://arxiv.org/abs/2508.00244v3Functional vs. Object-Oriented: Comparing How Programming Paradigms Affect the Architectural Characteristics of Systems2025-10-06T12:44:34ZThis study compares the impact of adopting object-oriented programming (OOP) or functional programming (FP) on the architectural characteristics of software systems. For that, it examines the design and implementation of a Digital Wallet system developed in Kotlin (for OOP) and Scala (for FP). The comparison is made through a mixed-method approach. The self-ethnographic qualitative analysis provides a side-by-side comparison of both implementations, revealing the perspective of those writing such code. The survey-based quantitative analysis gathers feedback from developers with diverse backgrounds, showing their impressions of those reading this code. Hopefully, these results may be useful for developers seeking to decide which paradigm is best suited for their next project.2025-08-01T01:06:06Z11 pages, 15 figures (1 table, 3 diagrams, 4 graphics, 7 listings), accepted to the CTICQS capstone project competition at SBQS 2025Proceedings of the 24th Brazilian Symposium on Software Quality (SBQS 2025), São José dos Campos/SP, 2025, pp. 584-594Briza Mel Dias de SousaUniversity of São PauloRenato Cordeiro FerreiraUniversity of São PauloJheronimus Academy of Data ScienceTechnical University of EindhovenTilburg UniversityAlfredo GoldmanUniversity of São Paulo10.5753/sbqs.2025.15170http://arxiv.org/abs/2508.18115v2Compositional Verification in Concurrent Separation Logic with Permissions Regions2025-10-06T09:01:33ZConcurrent separation logic with fractional permissions (CSLPerm) provides a promising reasoning system to verify most complex sequential and concurrent fine-grained programs. The logic with strong and weak separating conjunctions offers a solid foundation for producing concise and precise proofs. However, it lacks automation and compositionality support. This paper addresses this limitation by introducing a compositional verification system for concurrent programs that manipulate regions of shared memory. The centre of our system is novel logical principles and an entailment procedure that can infer the residual heaps in the frame rule for a fragment of CSL-Perm with explicit arithmetical constraints for memory heaps' disjointness. This procedure enables the compositional reasoning for concurrent threads and function calls. We have implemented the proposal in a prototype tool called CoSl, tested it with 10 challenging concurrent programs, including those beyond the state-of-the-art, and confirmed the advantage of our approach.2025-08-25T15:22:42ZQuang Loc Lehttp://arxiv.org/abs/2510.04081v1Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning2025-10-05T07:59:24ZReasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing methods often suffer from uncontrolled generation, insufficient quality, and limited diversity in reasoning paths. Recent efforts leverage code to enhance CoT by grounding reasoning in executable steps, but such methods are typically constrained to predefined mathematical problems, hindering scalability and generalizability. In this work, we propose Caco (Code-Assisted Chain-of-ThOught), a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data through code-driven augmentation. Unlike prior work, Caco first fine-tunes a code-based CoT generator on existing math and programming solutions in a unified code format, then scales the data generation to a large amount of diverse reasoning traces. Crucially, we introduce automated validation via code execution and rule-based filtering to ensure logical correctness and structural diversity, followed by reverse-engineering filtered outputs into natural language instructions and language CoTs to enrich task adaptability. This closed-loop process enables fully automated, scalable synthesis of reasoning data with guaranteed executability. Experiments on our created Caco-1.3M dataset demonstrate that Caco-trained models achieve strong competitive performance on mathematical reasoning benchmarks, outperforming existing strong baselines. Further analysis reveals that Caco's code-anchored verification and instruction diversity contribute to superior generalization across unseen tasks. Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.2025-10-05T07:59:24ZAccepted by NeurIPS2025Honglin LinQizhi PeiXin GaoZhuoshi PanYu LiJuntao LiConghui HeLijun Wuhttp://arxiv.org/abs/2510.04049v1Encoding Numeric Computations and Infusing Heuristic Knowledge Using Integrity Constraints in stableKanren2025-10-05T06:03:27ZThis paper presents examples of using integrity constraints in stableKanren to encode numeric computations for problem solving. Then, we use one of the examples to introduce multiple ways to infuse heuristic knowledge and reduce solving time. stableKanren is an extension of miniKanren that supports normal logic programs under stable model semantics. stableKanren further supports numeric computation by constructing a constraint store for integrity constraints. There are three ways to extend a relational programming language with numeric computations: relational number representation, grounding numbers to symbols, and constraint store construction. We demonstrate that the numeric computations in stableKanren have a straightforward numerical representation compared to relational number representations. More importantly, stableKanren balances symbolic and numeric computation in relational programming by avoiding the grounding of all numbers to symbols. Lastly, it also has simpler syntax compared to other constraint store construction approaches. stableKanren supports combinatorial search problem solving under a declarative generate and test paradigm. Such a paradigm generates all possible combinations of solutions to the problem, then applies a set of constraints to prune out the unwanted solutions. We demonstrate that different approaches to writing programs or queries affect the solver's performance in the SEND+MORE=MONEY puzzle. The performance gradually improves as more heuristic knowledge is infused through the programs or queries. Additionally, we show how to use an external function to achieve a hybrid solution.2025-10-05T06:03:27Z12 pages, 2 figures, ICFP '25 The miniKanren and Relational Programming WorkshopXiangyu GuoAjay Bansalhttp://arxiv.org/abs/2510.03789v1An Empirical Study of Rational Tree Unification for miniKanren2025-10-04T11:49:24ZWe present a study of unification for rational trees in the context of miniKanren. We give the definition of rational trees, specify the unification algorithm and prove some of its properties. We also introduce a number of heuristic optimizations and evaluate them for a number of relevant benchmarks. Finally we discuss the relations between rational and conventional unification algorithms and possible scenarios of their coexistence in the context of relational programming.2025-10-04T11:49:24ZEridan DomoratskiyDmitrii KosarevDmitry Boulytchevhttp://arxiv.org/abs/2404.01903v3Understanding How CodeLLMs (Mis)Predict Types with Activation Steering2025-10-03T17:11:21ZLarge Language Models (LLMs) are widely used by software engineers for programming tasks. However, research shows that LLMs often lack a deep understanding of program semantics. Even minor changes to syntax, such as renaming variables, can significantly degrade performance across various tasks. In this work, we examine the task of type prediction: given a partially typed program, can a model predict a missing type annotations such that the resulting program is more typed? We construct a dataset of adversarial examples where models initially predict the correct types, but begin to fail after semantically irrelevant edits. This is problematic, as models should ideally generalize across different syntactic forms of semantically equivalent code. This lack of robustness suggests that models may have a shallow understanding of code semantics. Despite this, we provide evidence that LLMs do, in fact, learn robust mechanisms for type prediction-though these mechanisms often fail to activate in adversarial scenarios. By using activation steering, a method that manipulates a model's internal activations to guide it toward using latent knowledge, we restore accurate predictions on adversarial inputs. We show that steering successfully activates a type prediction mechanism that is shared by both Python and TypeScript, and is more effective than prompting with in-context examples. Across five different models, our comprehensive evaluation demonstrates that LLMs can learn generalizable representations of code semantics that transfer across programming languages.2024-04-02T12:44:44Z40 pages, 67 figures. To be published at BlackBoxNLP 2025Francesca LucchettiArjun Guhahttp://arxiv.org/abs/2510.03170v1Beyond Cons: Purely Relational Data Structures2025-10-03T16:44:03ZWe present {Kanren} (read: set-Kanren), an extension to miniKanren with constraints for reasoning about sets and association lists. {Kanren} includes first-class set objects, a functionally complete family of set-theoretic constraints (including membership, union, and disjointedness), and new constraints for reasoning about association lists with shadowing and scoped lookup. These additions allow programmers to describe collections declaratively and lazily, without relying on structural encodings and eager search over representation spaces. The result is improved expressiveness and operational behavior in programs that manipulate abstract data -- particularly interpreters -- by supporting set equality based on contents, enabling finite failure. We describe the design and implementation of {Kanren} in a constraint-enabled miniKanren system and illustrate its use in representative examples.2025-10-03T16:44:03Z17 pages, 6 figures, Source code available at https://www.github.com/rvs314/faster-clpset-minikanren . To be published in the 7th Workshop on miniKanren and Relational Programming (miniKanren'25)Rafaello SannaWilliam E. ByrdNada Aminhttp://arxiv.org/abs/2410.05460v4It's Not Easy Being Green: On the Energy Efficiency of Programming Languages2025-10-03T15:17:20ZDoes the choice of programming language affect energy consumption? Previous highly visible studies have established associations between certain programming languages and energy consumption. A causal misinterpretation of this work has led academics and industry leaders to use or support certain languages based on their claimed impact on energy consumption. This paper tackles this causal question directly: it develops a detailed causal model capturing the complex relationship between programming language choice and energy consumption. This model identifies and incorporates several critical but previously overlooked factors that affect energy usage. These factors, such as distinguishing programming languages from their implementations, the impact of the application implementations themselves, the number of active cores, and memory activity, can significantly skew energy consumption measurements if not accounted for. We show -- via empirical experiments, improved methodology, and careful examination of anomalies -- that when these factors are controlled for, notable discrepancies in prior work vanish. Our analysis suggests that the choice of programming language implementation has no significant impact on energy consumption beyond execution time.2024-10-07T19:46:35Z12 pages, to appear at ASE'25Nicolas van KempenHyuk-Je KwonDung Tuan NguyenEmery D. Bergerhttp://arxiv.org/abs/2505.12380v3Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward2025-10-03T06:15:50ZReinforcement learning (RL) has been widely adopted to enhance the performance of large language models (LLMs) on Text-to-SQL tasks. However, existing methods often rely on execution-based or LLM-based Bradley-Terry reward models. The former suffers from high execution latency caused by repeated database calls, whereas the latter imposes substantial GPU memory overhead, both of which significantly hinder the efficiency and scalability of RL pipelines. To this end, we propose a novel reward model framework for RL-based Text-to-SQL named Graph-Reward-SQL, which employs the GMNScore outcome reward model. We leverage SQL graph representations to provide accurate reward signals while significantly reducing time cost and GPU memory usage. Building on this foundation, we further introduce StepRTM, a stepwise reward model that provides intermediate supervision over Common Table Expression (CTE) subqueries. This encourages both functional correctness and readability of SQL. Extensive comparative and ablation experiments on standard benchmarks, including Spider and BIRD, demonstrate that our method consistently outperforms existing reward models.2025-05-18T11:53:01ZHan WengPuzhen WuLongjie CuiYi ZhanBoyi LiuYuanfeng SongDun ZengYingxiang YangQianru ZhangDong HuangXiaoming YinYang SunXing Chenhttp://arxiv.org/abs/2408.14706v5Galley: Modern Query Optimization for Sparse Tensor Programs2025-10-02T20:19:05ZThe tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work on sparse tensor compilers has extended this paradigm to sparse tensors (i.e., tensors where most entries are not explicitly represented). With these systems, users define the semantics of the program and the algorithmic decisions in a concise language that can be compiled to efficient low-level code. However, these systems still require users to make complex decisions about program structure and memory layouts to write efficient programs.
This work presents Galley, a system for declarative tensor programming that allows users to write efficient tensor programs without making complex algorithmic decisions. Galley is the first system to perform cost based lowering of sparse tensor algebra to the imperative language of sparse tensor compilers, and the first to optimize arbitrary operators beyond sum and product. First, it decomposes the input program into a sequence of aggregation steps through a novel extension of the FAQ framework. Second, Galley optimizes and converts each aggregation step to a concrete program, which is compiled and executed with a sparse tensor compiler. We show that Galley produces programs that are 1-300x faster than competing methods for machine learning over joins and 5-20x faster than a state-of-the-art relational database for subgraph counting workloads with a minimal optimization overhead.2024-08-27T00:21:26ZKyle DeedsWillow AhrensMagda BalazinskaDan Suciuhttp://arxiv.org/abs/2509.21499v2On Code-Induced Reasoning in LLMs2025-10-02T16:45:24ZCode data has been shown to enhance the reasoning capabilities of large language models (LLMs), but it remains unclear which aspects of code are most responsible. We investigate this question with a systematic, data-centric framework. We construct parallel instruction datasets in ten programming languages and apply controlled perturbations that selectively disrupt structural or semantic properties of code. We then finetune LLMs from five model families and eight scales on each variant and evaluate their performance on natural language, math, and code tasks. Across 3,331 experiments, our results show that LLMs are more vulnerable to structural perturbations than semantic ones, particularly on math and code tasks. Appropriate abstractions like pseudocode and flowcharts can be as effective as code, while encoding the same information with fewer tokens without adhering to original syntax can often retain or even improve performance. Remarkably, even corrupted code with misleading signals remains competitive when surface-level regularities persist. Finally, syntactic styles also shape task-specific gains with Python favoring natural language reasoning and lower-level languages such as Java and Rust favoring math. Through our systematic framework, we aim to provide insight into how different properties of code influence reasoning and inform the design of training data for enhancing LLM reasoning capabilities.2025-09-25T19:57:36ZAbdul WaheedZhen WuCarolyn RoséDaphne Ippolitohttp://arxiv.org/abs/2401.11212v6Programming Distributed Collective Processes in the eXchange Calculus2025-10-02T13:35:00ZRecent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.2024-01-20T11:37:44ZLogical Methods in Computer Science, Volume 21, Issue 4 (October 3, 2025) lmcs:12917Giorgio AudritoRoberto CasadeiFerruccio DamianiGianluca TortaMirko Viroli10.46298/lmcs-21(4:3)2025