https://arxiv.org/api/oVzpLXMVF66ygo4w6f/bX6awUK4 2026-06-27T02:08:35Z 9951 1455 15 http://arxiv.org/abs/2507.01664v1 Globality and Regions 2025-07-02T12:45:50Z

We obtain a characterization of global variables by unifying abstraction with region abstraction in a region-based language. More precisely, in a previous work a language called global was presented, whose virtue is to provide a conceptually clear way of introducing imperative operations in a functional language. Memory safety is provided by the concept of linear protection, which connects the global system to a linear one. In this paper we show that the concept of global variable provided by the global language arises from the Tofte and Talping's region language through the unification of abstraction and region abstraction.

2025-07-02T12:45:50Z Hector Gramaglia http://arxiv.org/abs/2507.01272v1 Advanced LPeg techniques: A dual case study approach 2025-07-02T01:16:42Z

This paper presents advanced optimization techniques for Lua Parsing Expression Grammars (LPeg) through two complementary case studies: a high-performance JSON parser and a sophisticated Glob-to-LPeg pattern converter. We demonstrate how strategic grammar construction can dramatically improve parsing performance without modifying the underlying LPeg library. For the JSON parser, we implement substitution capture and table construction optimization to reduce memory allocation overhead and improve object processing. For the Glob converter, we introduce segment-boundary separation, implement Cox's flattened search strategy, and develop optimized braced condition handling to prevent exponential backtracking. Comprehensive benchmarks demonstrate that our JSON parser achieves processing speeds up to 125 MB/s on complex documents, consistently outperforming dkjson and showing competitive results against rxi_json across most test cases. Our Glob-to-LPeg converter exhibits 14-92% better performance than Bun.Glob and runs 3-14 times faster than Minimatch across diverse pattern matching scenarios. This research provides practical optimization techniques for LPeg-based parsers, contributing valuable strategies to the text processing ecosystem.

2025-07-02T01:16:42Z Journal of Computer Languages 84, 101343 (2025) Zixuan Zhu 10.1016/j.cola.2025.101343 http://arxiv.org/abs/2507.00488v1 Have Object-Oriented Languages Missed a Trick with Class Function and its Subclasses? 2025-07-01T07:04:12Z

Compared to functions in mathematics, functions in programming languages seem to be under classified. Functional programming languages based on the lambda calculus famously treat functions as first-class values. Object-oriented languages have adopted ``lambdas'', notably for call-back routines in event-based programming. Typically a programming language has functions, a function has a type, and some functions act on other functions and/or return functions but there is generally a lack of (i) ``class Function'' in the OO sense of the word class and particularly (ii) subclasses of Function for functions having specific properties. Some such classes are presented here and programmed in some popular programming languages as an experimental investigation into OO languages missing this opportunity.

2025-07-01T07:04:12Z Lloyd Allison http://arxiv.org/abs/2507.00264v1 Rust vs. C for Python Libraries: Evaluating Rust-Compatible Bindings Toolchains 2025-06-30T21:14:20Z

The Python programming language is best known for its syntax and scientific libraries, but it is also notorious for its slow interpreter. Optimizing critical sections in Python entails special knowledge of the binary interactions between programming languages, and can be cumbersome to interface manually, with implementers often resorting to convoluted third-party libraries. This comparative study evaluates the performance and ease of use of the PyO3 Python bindings toolchain for Rust against ctypes and cffi. By using Rust tooling developed for Python, we can achieve state-of-the-art performance with no concern for API compatibility.

2025-06-30T21:14:20Z 10 pages, 27 figures (1 diagram, 4 graphs, 9 tables, 13 code listings), submitted to SBAC-PAD 2025 Isabella Basso do Amaral University of São Paulo Renato Cordeiro Ferreira University of São Paulo Jheronimus Academy of Data Science Technical University of Eindhoven Tilburg University Alfredo Goldman University of São Paulo http://arxiv.org/abs/1306.1870v2 The Cyan Language 2025-06-30T17:56:16Z

This is the manual of Cyan, a prototype-based object-oriented language. Cyan supports gradual typing (both static and dynamic typing), single inheritance, anonymous functions, generic prototypes with concepts, non-nullable types, partially safe object initialization, an object-oriented exception handling system, and a powerful Metaobject Protocol for metaprogramming at compile-time (this is described elsewhere).

2013-06-08T02:22:42Z 244 pages José de Oliveira Guimarães http://arxiv.org/abs/2507.00108v1 Teaching Programming in the Age of Generative AI: Insights from Literature, Pedagogical Proposals, and Student Perspectives 2025-06-30T17:38:27Z

Computer programming is undergoing a true transformation driven by powerful new tools for automatic source code generation based on large language models. This transformation is also manifesting in introductory programming courses at universities around the world, generating an in-depth debate about how programming content should be taught, learned, and assessed in the context of generative artificial intelligence. This article aims, on the one hand, to review the most relevant studies on this issue, highlighting the advantages and disadvantages identified in the specialized literature. On the other hand, it proposes enriching teaching and learning methodologies by focusing on code comprehension and execution rather than on mere coding or program functionality. In particular, it advocates for the use of visual representations of code and visual simulations of its execution as effective tools for teaching, learning, and assessing programming, thus fostering a deeper understanding among students. Finally, the opinions of students who took the object-oriented programming course are presented to provide preliminary context supporting the incorporation of visual simulations in Java (or other languages) as part of the training process.

2025-06-30T17:38:27Z Clemente Rubio-Manzano Jazna Meza Rodolfo Fernandez-Santibanez Christian Vidal-Castro http://arxiv.org/abs/2405.05118v4 Full Version: (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms 2025-06-30T16:23:35Z

We formally introduce a systematic (de/re)-composition approach, based on the algebraic formalism of "Multi-Dimensional Homomorphisms (MDHs)". Our approach is designed as general enough to be applicable to a wide range of data-parallel computations and for various kinds of target parallel architectures. To efficiently target the deep and complex memory and core hierarchies of contemporary architectures, we exploit our introduced (de/re)-composition approach for a correct-by-construction, parametrized cache blocking and parallelization strategy. We show that our approach is powerful enough to express, in the same formalism, the (de/re)-composition strategies of different classes of state-of-the-art approaches (scheduling-based, polyhedral, etc), and we demonstrate that the parameters of our strategies enable systematically generating code that can be fully automatically optimized (auto-tuned) for the particular target architecture and characteristics of the input and output data (e.g., their sizes and memory layouts). Particularly, our experiments confirm that via auto-tuning, we achieve higher performance than state-of-the-art approaches, including hand-optimized solutions provided by vendors (such as NVIDIA cuBLAS/cuDNN and Intel oneMKL/oneDNN), on real-world data sets and for a variety of data-parallel computations, including: linear algebra routines, stencil and quantum chemistry computations, data mining algorithms, and computations that recently gained high attention due to their relevance for deep learning.

2024-05-08T15:16:02Z A short version of this paper is published at ACM TOPLAS and presented at PLDI'24 ACM Trans. Program. Lang. Syst. (May 2024) Ari Rasch 10.1145/3665643 http://arxiv.org/abs/2501.18377v3 Using Read Promotion and Mixed Isolation Levels for Performant Yet Serializable Execution of Transaction Programs 2025-06-30T15:16:20Z

We propose a theory that can determine the lowest isolation level that can be allocated to each transaction program in an application in a mixed-isolation-level setting, to guarantee that all executions will be serializable and thus preserve all integrity constraints, even those that are not explicitly declared. This extends prior work applied to completely known transactions, to deal with the realistic situation where transactions are generated by running programs with parameters that are not known in advance. Using our theory, we propose an optimization method that allows for high throughput while ensuring that all executions are serializable. Our method is based on searching for application code modifications that are semantics-preserving while improving the isolation level allocation. We illustrate our approach to the SmallBank benchmark.

2025-01-30T14:29:45Z Brecht Vandevoort Alan Fekete Bas Ketsman Frank Neven Stijn Vansummeren http://arxiv.org/abs/2507.00094v1 Efficient Conformance Checking of Rich Data-Aware Declare Specifications (Extended) 2025-06-30T10:16:21Z

Despite growing interest in process analysis and mining for data-aware specifications, alignment-based conformance checking for declarative process models has focused on pure control-flow specifications, or mild data-aware extensions limited to numerical data and variable-to-constant comparisons. This is not surprising: finding alignments is computationally hard, even more so in the presence of data dependencies. In this paper, we challenge this problem in the case where the reference model is captured using data-aware Declare with general data types and data conditions. We show that, unexpectedly, it is possible to compute data-aware optimal alignments in this rich setting, enjoying at once efficiency and expressiveness. This is achieved by carefully combining the two best-known approaches to deal with control flow and data dependencies when computing alignments, namely A* search and SMT solving. Specifically, we introduce a novel algorithmic technique that efficiently explores the search space, generating descendant states through the application of repair actions aiming at incrementally resolving constraint violations. We prove the correctness of our algorithm and experimentally show its efficiency. The evaluation witnesses that our approach matches or surpasses the performance of the state of the art while also supporting significantly more expressive data dependencies, showcasing its potential to support real-world applications.

2025-06-30T10:16:21Z Extended version of the paper of the same title accepted at the 23rd International Conference on Business Process Management (BPM 2025) Jacobo Casas-Ramos Sarah Winkler Alessandro Gianola Marco Montali Manuel Mucientes Manuel Lama http://arxiv.org/abs/2507.22065v1 Fuzzing: Randomness? Reasoning! Efficient Directed Fuzzing via Large Language Models 2025-06-30T04:33:52Z

Fuzzing is highly effective in detecting bugs due to the key contribution of randomness. However, randomness significantly reduces the efficiency of fuzzing, causing it to cost days or weeks to expose bugs. Even though directed fuzzing reduces randomness by guiding fuzzing towards target buggy locations, the dilemma of randomness still challenges directed fuzzers. Two critical components, which are seeds and mutators, contain randomness and are closely tied to the conditions required for triggering bugs. Therefore, to address the challenge of randomness, we propose to use large language models (LLMs) to remove the randomness in seeds and reduce the randomness in mutators. With their strong reasoning and code generation capabilities, LLMs can be used to generate reachable seeds that target pre-determined locations and to construct bug-specific mutators tailored for specific bugs. We propose RandLuzz, which integrates LLMs and directed fuzzing, to improve the quality of seeds and mutators, resulting in efficient bug exposure. RandLuzz analyzes function call chain or functionality to guide LLMs in generating reachable seeds. To construct bug-specific mutators, RandLuzz uses LLMs to perform bug analysis, obtaining information such as bug causes and mutation suggestions, which further help generate code that performs bug-specific mutations. We evaluate RandLuzz by comparing it with four state-of-the-art directed fuzzers, AFLGo, Beacon, WindRanger, and SelectFuzz. With RandLuzz-generated seeds, the fuzzers achieve an average speedup ranging from 2.1$\times$ to 4.8$\times$ compared to using widely-used initial seeds. Additionally, when evaluated on individual bugs, RandLuzz achieves up to a 2.7$\times$ speedup compared to the second-fastest exposure. On 8 bugs, RandLuzz can even expose them within 60 seconds.

2025-06-30T04:33:52Z Xiaotao Feng Xiaogang Zhu Kun Hu Jincheng Wang Yingjie Cao Guang Gong Jianfeng Pan http://arxiv.org/abs/2506.23320v1 A Denotational Semantics for Quantum Loops 2025-06-29T16:30:29Z

Programming a quantum computer, i.e., implementing quantum algorithms on a quantum processor-based copmputer architecture, is a task that can be addressed (just as for classical computers) at different levels of abstraction. This paper proposes a denotational semantics for high-level quantum programming constructs, focusing on the conceptual meaning of quantum-controlled branching and iteration. We introduce a denotational domain where a mathematical meaning of a quantum control flow with loops can be defined, which reflects the coherent evolution of the quantum system implementing the program.

2025-06-29T16:30:29Z 17 pages Nicola Assolini Alessandra Di Pierro http://arxiv.org/abs/2506.23058v1 Verifying Properties of Index Arrays in a Purely-Functional Data-Parallel Language 2025-06-29T02:10:25Z

This paper presents a novel approach to automatically verify properties of pure data-parallel programs with non-linear indexing -- expressed as pre- and post-conditions on functions. Programs consist of nests of second-order array combinators (e.g., map, scan, and scatter) and loops. The key idea is to represent arrays as index functions: programs are index function transformations over which properties are propagated and inferred. Our framework proves properties on index functions by distilling them into algebraic (in)equalities and discharging them to a Fourier-Motzkin-based solver. The framework is practical and accessible: properties are not restricted to a decidable logic, but instead are carefully selected to express practically useful guarantees that can be automatically reasoned about and inferred. These guarantees extend beyond program correctness and can be exploited by the entire compiler pipeline for optimization. We implement our system in the pure data-parallel language Futhark and demonstrate its practicality on seven applications, reporting an average verification time of 1 second. Two case studies show how eliminating dynamic verification in GPU programs results in significant speedups.

2025-06-29T02:10:25Z Nikolaj Hey Hinnerskov Robert Schenck Cosmin E. Oancea http://arxiv.org/abs/2506.22776v1 Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation 2025-06-28T06:32:25Z

Quantization has emerged as a mainstream method for compressing Large Language Models (LLMs), reducing memory requirements and accelerating inference without architectural modifications. While existing research primarily focuses on evaluating the effectiveness of quantized LLMs compared to their original counterparts, the impact on robustness remains largely unexplored.In this paper, we present the first systematic investigation of how quantization affects the robustness of LLMs in code generation tasks. Through extensive experiments across four prominent LLM families (LLaMA, DeepSeek, CodeGen, and StarCoder) with parameter scales ranging from 350M to 33B, we evaluate robustness from dual perspectives: adversarial attacks on input prompts and noise perturbations on model architecture. Our findings challenge conventional wisdom by demonstrating that quantized LLMs often exhibit superior robustness compared to their full-precision counterparts, with 51.59% versus 42.86% of our adversarial experiments showing better resilience in quantized LLMs. Similarly, our noise perturbation experiments also confirm that LLMs after quantitation generally withstand higher levels of weight disturbances. These results suggest that quantization not only reduces computational requirements but can actually enhance LLMs' reliability in code generation tasks, providing valuable insights for developing more robust and efficient LLM deployment strategies.

2025-06-28T06:32:25Z 13 pages, 6 figures Sen Fang Weiyuan Ding Antonio Mastropaolo Bowen Xu http://arxiv.org/abs/2310.02003v6 L2MAC: Large Language Model Automatic Computer for Extensive Code Generation 2025-06-27T17:28:14Z

Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task; we show that L2MAC works for general-purpose extensive text-based tasks, such as writing an entire book; and we provide valuable insights into L2MAC's performance improvement over existing methods.

2023-10-02T16:55:19Z Published in The Twelfth International Conference on Learning Representations (ICLR), 2024. Copyright 2023 by the author(s) Samuel Holt Max Ruiz Luyten Mihaela van der Schaar http://arxiv.org/abs/2506.22323v1 Under the Hood of BlotchyQuasar: DLL-Based RAT Campaigns Against Latin America 2025-06-27T15:36:10Z

A sophisticated malspam campaign was recently uncovered targeting Latin American countries, with a particular focus on Brazil. This operation utilizes a highly deceptive phishing email to trick users into executing a malicious MSI file, initiating a multi-stage infection. The core of the attack leverages DLL side-loading, where a legitimate executable from Valve Corporation is used to load a trojanized DLL, thereby bypassing standard security defenses. Once active, the malware, a variant of QuasarRAT known as BlotchyQuasar, is capable of a wide range of malicious activities. It is designed to steal sensitive browser-stored credentials and banking information, the latter through fake login windows mimicking well-known Brazilian banks. The threat establishes persistence by modifying the Windows registry , captures user keystrokes through keylogging , and exfiltrates stolen data to a Command-and-Control (C2) server using encrypted payloads. Despite its advanced capabilities, the malware code exhibits signs of rushed development, with inefficiencies and poor error handling that suggest the threat actors prioritized rapid deployment over meticulous design. Nonetheless, the campaign extensive reach and sophisticated mechanisms pose a serious and immediate threat to the targeted regions, underscoring the need for robust cybersecurity defenses.

2025-06-27T15:36:10Z Alessio Di Santo