https://arxiv.org/api/hcL0s3rCSxaUGFqwkmhCrDXuQq8
2026-06-26T19:03:03Z
9951
1350
15
http://arxiv.org/abs/2508.02305v1
Proceedings 14th International Workshop on Trends in Functional Programming in Education
2025-08-04T11:18:19Z
The goal of TFPIE is to gather researchers, teachers and professionals that use, or are interested in the use of, functional programming in education. TFPIE aims to be a venue where novel ideas, classroom-tested ideas and work-in-progress on the use of functional programming in education are discussed. The one-day workshop will foster a spirit of open discussion by having a review process for publication after the workshop.
2025-08-04T11:18:19Z
EPTCS 424, 2025
Rose Bohrer
AIST, Tokyo, JP
10.4204/EPTCS.424
http://arxiv.org/abs/2508.01974v1
Flow Sensitivity without Control Flow Graph: An Efficient Andersen-Style Flow-Sensitive Pointer Analysis
2025-08-04T01:20:54Z
Flow-sensitive pointer analysis constitutes an essential component of precise program analysis for accurately modeling pointer behaviors by incorporating control flows. Flow-sensitive pointer analysis is extensively used in alias analysis, taint analysis, program understanding, compiler optimization, etc. Existing flow-sensitive pointer analysis approaches, which are conducted based on control flow graphs, have significantly advanced the precision of pointer analysis via sophisticated techniques to leverage control flow information. However, they inevitably suffer from computational inefficiencies when resolving points-to information due to the inherent complex structures of control flow graphs. We present CG-FSPTA, a Flow-Sensitive Constraint Graph (FSConsG) based flow-sensitive pointer analysis to overcome the inefficiency of control-flow-graph-based analysis. CG-FSPTA uses a flow-sensitive variant to leverage the structural advantages of set-constraint graphs (which are commonly used in flow-insensitive pointer analysis) while keeping the flow sensitivity of variable definitions and uses, allowing the incorporation of sophisticated graph optimization and dynamic solving techniques. In this way, CG-FSPTA achieves significant efficiency improvements while keeping the precision of flow-sensitive analysis. Experimental evaluations on benchmark programs demonstrate that CG-FSPTA, significantly reduces both memory usage and execution time while maintaining precision. In particular, by solving in the FSConsG, CG-FSPTA achieves an average memory reduction of 33.05\% and accelerates flow-sensitive pointer analysis by 7.27x compared to the state-of-art method. These experimental results underscore the efficacy of CG-FSPTA as a scalable solution to analyze large-scale software systems, establishing a robust foundation for future advancements in efficient program analysis frameworks.
2025-08-04T01:20:54Z
Jiahao Zhang
Xiao Cheng
Yuxiang Lei
http://arxiv.org/abs/2508.01199v1
Efficient compilation and execution of synchronous programs via type-state programming
2025-08-02T05:10:21Z
Synchronous programs are used extensively in implementation of safety critical embedded software. Imperative synchronous programming languages model multiple Finite State Machines (FSMs) executing in lockstep at logical clock ticks. The synchronous view of time along with the FSM based design enables easier formal verification. The synchronous composition of multiple FSMs, during compilation, results in the well known state space explosion problem. Hence, efficiently compiling imperative synchronous programs into small and fast executables is challenging. This paper introduces a novel linear time compilation technique for automata based compilation of synchronous programs. Graph based rewrite rules for kernel programming constructs are introduced. A linear time algorithm applies these rules to produce a FSM. The FSM is then encoded into a type-state program using template meta-programming in C++. Experimental results show that the compilation time and generated binary size is comparable, while the execution times are on average 31-60% faster than current state-of-the-art compilers.
2025-08-02T05:10:21Z
Avinash Malik
http://arxiv.org/abs/2508.00772v1
From Code to Career: Assessing Competitive Programmers for Industry Placement
2025-08-01T16:52:44Z
In today's fast-paced tech industry, there is a growing need for tools that evaluate a programmer's job readiness based on their coding performance. This study focuses on predicting the potential of Codeforces users to secure various levels of software engineering jobs. The primary objective is to analyze how a user's competitive programming activity correlates with their chances of obtaining positions, ranging from entry-level roles to jobs at major tech companies. We collect user data using the Codeforces API, process key performance metrics, and build a prediction model using a Random Forest classifier. The model categorizes users into four levels of employability, ranging from those needing further development to those ready for top-tier tech jobs. The system is implemented using Flask and deployed on Render for real-time predictions. Our evaluation demonstrates that the approach effectively distinguishes between different skill levels based on coding proficiency and participation. This work lays a foundation for the use of machine learning in career assessment and could be extended to predict job readiness in broader technical fields.
2025-08-01T16:52:44Z
Md Imranur Rahman Akib
Fathima Binthe Muhammed
Umit Saha
Md Fazlul Karim Patwary
Mehrin Anannya
Md Alomgeer Hussein
Md Biplob Hosen
http://arxiv.org/abs/2508.00534v1
Towards a unified framework for programming paradigms: A systematic review of classification formalisms and methodological foundations
2025-08-01T11:19:40Z
The rise of multi-paradigm languages challenges traditional classification methods, leading to practical software engineering issues like interoperability defects. This systematic literature review (SLR) maps the formal foundations of programming paradigms. Our objective is twofold: (1) to assess the state of the art of classification formalisms and their limitations, and (2) to identify the conceptual primitives and mathematical frameworks for a more powerful, reconstructive approach.
Based on a synthesis of 74 primary studies, we find that existing taxonomies lack conceptual granularity, a unified formal basis, and struggle with hybrid languages. In response, our analysis reveals a strong convergence toward a compositional reconstruction of paradigms. This approach identifies a minimal set of orthogonal, atomic primitives and leverages mathematical frameworks, predominantly Type theory, Category theory and Unifying Theories of Programming (UTP), to formally guarantee their compositional properties.
We conclude that the literature reflects a significant intellectual shift away from classification towards these promising formal, reconstructive frameworks. This review provides a map of this evolution and proposes a research agenda for their unification.
2025-08-01T11:19:40Z
Preprint submitted to the Journal of Object Technology on July 29, 2025. Data available upon request until peer-review is completed
Mikel Vandeloise
http://arxiv.org/abs/2508.00508v1
Desyan: A Platform for Seamless Value-Flow and Symbolic Analysis
2025-08-01T10:39:09Z
Over the past two decades, two different types of static analyses have emerged as dominant paradigms both in academia and industry: value-flow analysis (e.g., data-flow analysis or points-to analysis) and symbolic analysis (e.g., symbolic execution). Despite their individual successes in numerous application fields, the two approaches have remained largely separate; an artifact of the simple reality that there is no broadly adopted unifying platform for effortless and efficient integration of symbolic techniques with high-performance data-flow reasoning.
To bridge this gap, we introduce Desyan: a platform for writing program analyses with seamless integration of value-flow and symbolic reasoning. Desyan expands a production-ready Datalog fixpoint engine (Soufflé) with full-fledged SMT solving invoking industry-leading SMT engines. Desyan provides constructs for automatically (and efficiently!) handling typical patterns that come up in program analysis. At the same time, the integration is agnostic with respect to the solving technology, and supports Datalog-native symbolic reasoning, via a bottom-up algebraic reasoning module.
The result is an engine that allows blending different kinds of reasoning, as needed for the underlying analysis. For value-flow analysis, the engine is the best-in-class Datalog evaluator (often by a factor of over 20x in execution time); for applications that require full SMT (e.g., a concolic execution engine or other symbolic evaluator that needs to solve arbitrarily complex conditions), the engine is leveraging the leading SMT solvers; for lightweight symbolic evaluation (e.g., solving simple conditionals in the context of a path-sensitive analysis), the engine can use Datalog-native symbolic reasoning, achieving large speedups (often of over 2x) compared to eagerly appealing to an SMT solver.
2025-08-01T10:39:09Z
Panagiotis Diamantakis
Thanassis Avgerinos
Yannis Smaragdakis
http://arxiv.org/abs/2411.16544v3
Float Self-Tagging
2025-08-01T10:23:19Z
Dynamic and polymorphic languages attach information, such as types, to run time objects, and therefore adapt the memory layout of values to include space for this information. This makes it difficult to efficiently implement IEEE754 floating-point numbers as this format does not leave an easily accessible space to store type information. The three main floating-point number encodings in use today, tagged pointers, NaN-boxing, and NuN-boxing, have drawbacks. Tagged pointers entail a heap allocation of all float objects, and NaN/NuN-boxing puts additional run time costs on type checks and the handling of other objects.
This paper introduces self-tagging, a new approach to object tagging that uses an invertible bitwise transformation to map floating-point numbers to tagged values that contain the correct type information at the correct position in their bit pattern, superimposing both their value and type information in a single machine word. Such a transformation can only map a subset of all floats to correctly typed tagged values, hence self-tagging takes advantage of the non-uniform distribution of floating point numbers used in practice to avoid heap allocation of the most frequently encountered floats.
Variants of self-tagging were implemented in two distinct Scheme compilers and evaluated on four microarchitectures to assess their performance and compare them to tagged pointers, NaN-boxing, and NuN-boxing. Experiments demonstrate that, in practice, the approach eliminates heap allocation of nearly all floating-point numbers and provides good execution speed of float-intensive benchmarks in Scheme with a negligible performance impact on other benchmarks, making it an attractive alternative to tagged pointers, alongside NaN-boxing and NuN-boxing.
2024-11-25T16:29:43Z
Olivier Melançon
Manuel Serrano
Marc Feeley
10.1145/3763108
http://arxiv.org/abs/2508.00482v1
Semantic Subtyping for Maps in Erlang
2025-08-01T09:57:59Z
In this paper we will construct a set-theoretic model of types featuring type variables, base types, set-theoretic types and map types. Syntax of map types spans all the map types available in Erlang. The model of types is used to define a semantic subtyping relation based on set containment. The novelty of this work is the definition of subtyping over parameteric map types.
2025-08-01T09:57:59Z
Erdem Yildirim
Albert Schimpf
Stefan Wehr
Annette Bieniusa
http://arxiv.org/abs/2508.00422v1
Automated Type Annotation in Python Using Large Language Models
2025-08-01T08:24:14Z
Type annotations in Python enhance maintainability and error detection. However, generating these annotations manually is error prone and requires extra effort. Traditional automation approaches like static analysis, machine learning, and deep learning struggle with limited type vocabularies, behavioral over approximation, and reliance on large labeled datasets. In this work, we explore the use of LLMs for generating type annotations in Python. We develop a generate check repair pipeline: the LLM proposes annotations guided by a Concrete Syntax Tree representation, a static type checker (Mypy) verifies them, and any errors are fed back for iterative refinement. We evaluate four LLM variants: GPT 4oMini, GPT 4.1mini (general-purpose), and O3Mini, O4Mini (reasoning optimized), on 6000 code snippets from the ManyTypes4Py benchmark. We first measure the proportion of code snippets annotated by LLMs for which MyPy reported no errors (i.e., consistent results): GPT 4oMini achieved consistency on 65.9% of cases (34.1% inconsistent), while GPT 4.1mini, O3Mini, and O4Mini each reached approximately 88.6% consistency (around 11.4% failures). To measure annotation quality, we then compute exact-match and base-type match accuracies over all 6000 snippets: GPT 4.1mini and O3Mini perform the best, achieving up to 70.5% exact match and 79.1% base type accuracy, requiring under one repair iteration on average. Our results demonstrate that general-purpose and reasoning optimized LLMs, without any task specific fine tuning or additional training can be effective in generating consistent type annotations.They perform competitively with traditional deep learning techniques which require large labeled dataset for training. While our work focuses on Python, the pipeline can be extended to other optionally typed imperative languages like Ruby
2025-08-01T08:24:14Z
Under Review
Varun Bharti
Shashwat Jha
Dhruv Kumar
Pankaj Jalote
http://arxiv.org/abs/2508.00419v1
Loop Invariant Generation: A Hybrid Framework of Reasoning optimised LLMs and SMT Solvers
2025-08-01T08:15:15Z
Loop invariants are essential for proving the correctness of programs with loops. Developing loop invariants is challenging, and fully automatic synthesis cannot be guaranteed for arbitrary programs. Some approaches have been proposed to synthesize loop invariants using symbolic techniques and more recently using neural approaches. These approaches are able to correctly synthesize loop invariants only for subsets of standard benchmarks. In this work, we investigate whether modern, reasoning-optimized large language models can do better. We integrate OpenAI's O1, O1-mini, and O3-mini into a tightly coupled generate-and-check pipeline with the Z3 SMT solver, using solver counterexamples to iteratively guide invariant refinement. We use Code2Inv benchmark, which provides C programs along with their formal preconditions and postconditions. On this benchmark of 133 tasks, our framework achieves 100% coverage (133 out of 133), outperforming the previous best of 107 out of 133, while requiring only 1-2 model proposals per instance and 14-55 seconds of wall-clock time. These results demonstrate that LLMs possess latent logical reasoning capabilities which can help automate loop invariant synthesis. While our experiments target C-specific programs, this approach should be generalizable to other imperative languages.
2025-08-01T08:15:15Z
Under Review
Varun Bharti
Shashwat Jha
Dhruv Kumar
Pankaj Jalote
http://arxiv.org/abs/2507.23186v2
NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions
2025-08-01T03:11:12Z
When numerically evaluating a function's gradient, sparsity detection can enable substantial computational speedups through Jacobian coloring and compression. However, sparsity detection techniques for black-box functions are limited, and existing finite-difference-based methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits the universal contamination property of IEEE 754 Not-a-Number values to trace input-output dependencies through floating-point numerical computations. By systematically contaminating inputs with NaN and observing which outputs become NaN, the method reconstructs conservative sparsity patterns that eliminate a major source of false negatives. We demonstrate this approach on an aerospace wing weight model, achieving a 1.52x speedup while uncovering dozens of dependencies missed by conventional methods -- a significant practical improvement since gradient computation is often the bottleneck in optimization workflows. The technique leverages IEEE 754 compliance to work across programming languages and math libraries without requiring modifications to existing black-box codes. Furthermore, advanced strategies such as NaN payload encoding via direct bit manipulation enable faster-than-linear time complexity, yielding speed improvements over existing black-box sparsity detection methods. Practical algorithms are also proposed to mitigate challenges from branching code execution common in engineering applications.
2025-07-31T01:48:56Z
Peter Sharpe
http://arxiv.org/abs/2508.00952v1
Academic Vibe Coding: Opportunities for Accelerating Research in an Era of Resource Constraint
2025-08-01T00:42:44Z
Academic laboratories face mounting resource constraints: budgets are tightening, grant overheads are potentially being capped, and the market rate for data-science talent significantly outstrips university compensation. Vibe coding, which is structured, prompt-driven code generation with large language models (LLMs) embedded in reproducible workflows, offers one pragmatic response. It aims to compress the idea-to-analysis timeline, reduce staffing pressure on specialized data roles, and maintain rigorous, version-controlled outputs. This article defines the vibe coding concept, situates it against the current academic resourcing crisis, details a beginner-friendly toolchain for its implementation, and analyzes inherent limitations that necessitate governance and mindful application.
2025-08-01T00:42:44Z
Matthew G Crowson
Leo Celi A. Celi
http://arxiv.org/abs/2507.22069v2
A Compute-Matched Re-Evaluation of TroVE on MATH
2025-07-31T07:33:11Z
Reusing established theorems and formulas is central to mathematical problem solving, serving as essential building blocks for tackling increasingly complex challenges. Recent work, TroVE, argues that code-generating Large Language Models (LLMs) can benefit similarly on the MATH benchmark by inducing and reusing higher-level toolboxes. By allocating computational budget across an ensemble of three modes -- directly generating code, creating tools, and reusing tools -- TroVE claims to outperform a PRIMITIVE baseline that only performs direct generation. However, recent analysis (Berlot-Attwell et al., 2024) casts doubt on these gains, noting that the tools created are often trivial or rarely reused, suggesting that improvements may stem from self-consistency or self-correction. In this work, we re-evaluate TroVE on MATH, analyze the impact of each of its modes, and show that its benefit does not come from these mechanisms, but simply from a higher computational budget spent for TroVE compared to PRIMITIVE. To this end, we also perform a small correction in the original implementation of TroVE's selection mechanism, boosting TroVE's performance on MATH by 3\% in accuracy. After matching for compute, the benefit of TroVE reduces to a marginal improvement of 1\%, suggesting that this toolbox approach does not provide a significant benefit on MATH.
2025-07-16T03:11:43Z
Tobias Sesterhenn
Ian Berlot-Attwell
Janis Zenkner
Christian Bartelt
http://arxiv.org/abs/2507.23292v1
SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy
2025-07-31T07:10:39Z
We introduce a neural network layer API and library for sequence modeling, designed for easy creation of sequence models that can be executed both layer-by-layer (e.g., teacher-forced training) and step-by-step (e.g., autoregressive sampling). To achieve this, layers define an explicit representation of their state over time (e.g., a Transformer KV cache, a convolution buffer, an RNN hidden state), and a step method that evolves that state, tested to give identical results to a stateless layer-wise invocation. This and other aspects of the SequenceLayers contract enables complex models to be immediately streamable, mitigates a wide range of common bugs arising in both streaming and parallel sequence processing, and can be implemented in any deep learning library. A composable and declarative API, along with a comprehensive suite of layers and combinators, streamlines the construction of production-scale models from simple streamable components while preserving strong correctness guarantees. Our current implementations of SequenceLayers (JAX, TensorFlow 2) are available at https://github.com/google/sequence-layers.
2025-07-31T07:10:39Z
RJ Skerry-Ryan
Julian Salazar
Soroosh Mariooryad
David Kao
Daisy Stanton
Eric Battenberg
Matt Shannon
Ron J. Weiss
Robin Scheibler
Jonas Rothfuss
Tom Bagby
http://arxiv.org/abs/2507.23205v1
Kernel-FFI: Transparent Foreign Function Interfaces for Interactive Notebooks
2025-07-31T02:58:11Z
Foreign Function Interfaces (FFIs) are essential for enabling interoperability between programming languages, yet existing FFI solutions are ill-suited for the dynamic, interactive workflows prevalent in modern notebook environments such as Jupyter. Current approaches require extensive manual configuration, introduce significant boilerplate, and often lack support for recursive calls and object-oriented programming (OOP) constructs-features critical for productive, multi-language development.
We present Kernel-FFI, a transparent, language-agnostic framework that enables seamless cross-language function calls and object manipulation within interactive notebooks. Kernel-FFI employs source-level transformation to automatically rewrite cross-language invocations, eliminating the need for manual bindings or boilerplate. Kernel-FFI provides robust support for OOP by enabling foreign object referencing and automatic resource management across language boundaries. Furthermore, to address the blocking nature of Jupyter kernels and support recursive and asynchronous foreign calls, we introduce a novel side-channel communication mechanism. Our tool will be open-sourced and available at https://codepod.io/docs/kernel-ffi
2025-07-31T02:58:11Z
Hebi Li
Forrest Sheng Bao
Qi Xiao
Jin Tian