https://arxiv.org/api/UV3BfHNpQN2LPHXMfXY5fYXxp1g 2026-06-10T23:52:15Z 183838 390 15 http://arxiv.org/abs/2606.06656v2 A Study of Parallel Continuous Local Search 2026-06-08T15:19:28Z

We study parallel Continuous Local Search (CLS) as a solution approach for Boolean satisfiability problems with symmetric pseudo-Boolean (PB) constraints. Here, the $n$-variable PB-satisfiability problem is relaxed to a continuous optimisation problem with a differentiable objective function on an $n$-dimensional hypercube. For satisfiable instances, the global minimisers of this optimisation problem correspond to satisfying assignments of the SAT problem at hand. We present several novel findings via empirical experiments: (i) redundant constraints can inhibit rather than accelerate convergence; (ii) CLS shows promise as a sub-solver in hybridised settings, quickly completing partial assignments; and (iii) local search rapidly converges to a stable distribution of solution quality (i.e., degree of satisfaction), due to saddle-dense objectives where additional solver steps yield diminishing returns. Our findings inform practical uses of CLS for SAT on modern accelerator hardware.

2026-06-04T19:03:32Z Cody J Christopher Charles Gretton http://arxiv.org/abs/2606.09610v1 Shape Formation for the Cooperative Transportation of Arbitrary Objects Using Multi-Agent Reinforcement Learning 2026-06-08T15:18:14Z

Cooperative object transportation is essential in numerous domains, including industrial to domestic services. A popular transportation strategy is to carry objects on top of multi-robot systems. The corresponding task is typically solved by decomposing it into three interconnected subproblems: formation control, cooperative navigation, and collision avoidance. A particular challenge posed by real-world objects is their potentially arbitrary shape and non-uniform mass distribution, necessitating robot formations that securely support the object. In this work, we address the challenge of pattern formation control for transporting such real-world objects by proposing a novel multi-agent reinforcement learning approach. Our approach enables a multi-robot system to autonomously position itself underneath an object to support its weight while avoiding obstacles during the formation process. Our evaluations with diverse environments and varying numbers of robots show that our approach leads to policies that reliably produce balanced formations and generalize to cluttered scenes and objects with complex geometry and non-uniform mass distribution.

2026-06-08T15:18:14Z Mohamed Sayed Wolfram Burgard Tanja Katharina Kaiser http://arxiv.org/abs/2606.09607v1 Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes 2026-06-08T15:17:54Z

Interpretability increasingly treats groups of components, not individual units, as the basic object, and proposes to find them by clustering co-activation statistics. We ask whether such a cheap signal actually identifies an attention-head circuit. Adapting a sparse-autoencoder clustering recipe to attention heads -- but validating by causal ablation rather than reconstruction -- we cluster heads and then run a closure test: ablate the discovered community and compare per-example damage to matched-random controls. Across two dense 1B-scale models (Pythia 1B, OLMo 1B) and two input distributions, the communities pass closure. In a Mixture-of-Experts model (OLMoE-1B-7B), route-conditional clustering recovers a statistically real signal that nonetheless does not survive closure -- ablation improves loss, the wrong direction. Extending closure across training, attention-target selectivity and participation ratio decouple from function in both directions. We conclude that a cheap signal is a circuit proposal, not a confirmed circuit; closure is what separates them.

2026-06-08T15:17:54Z 22 pages, 3 figures Yongzhong Xu http://arxiv.org/abs/2312.02873v2 Toward autocorrection of chemical process flowsheets using large language models 2026-06-08T15:15:44Z

The process engineering domain widely uses Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&IDs) to represent process flows and equipment configurations. However, the P&IDs and PFDs, hereafter called flowsheets, can contain errors causing safety hazards, inefficient operation, and unnecessary expenses. Correcting and verifying flowsheets is a tedious, manual process. We propose a novel generative AI methodology for automatically identifying errors in flowsheets and suggesting corrections to the user, i.e., autocorrecting flowsheets. Inspired by the breakthrough of Large Language Models (LLMs) for grammatical autocorrection of human language, we investigate LLMs for the autocorrection of flowsheets. The input to the model is a potentially erroneous flowsheet and the output of the model are suggestions for a corrected flowsheet. We train our autocorrection model on a synthetic dataset in a supervised manner. The model achieves a top-1 accuracy of 80% and a top-5 accuracy of 84% on an independent test dataset of synthetically generated flowsheets. The results suggest that the model can learn to autocorrect the synthetic flowsheets. We envision that flowsheet autocorrection will become a useful tool for chemical engineers.

2023-12-05T16:39:41Z Computer Aided Chemical Engineering, Volume 53, 2024, Pages 3109-3114 Lukas Schulze Balhorn Marc Caballero Artur M. Schweidtmann 10.1016/B978-0-443-28824-1.50519-6 http://arxiv.org/abs/2606.09605v1 Next-Token Prediction Learns Generalisable Representations of Sleep Physiology 2026-06-08T15:13:57Z

Foundation models offer a promising route to compress multi-modal physiological signals into compact representations of human health, with broad applications across sleep medicine, cardiology, neurology and other healthcare domains. Existing models have typically been trained with masked-reconstruction or contrastive objectives. However, masked reconstruction may be poorly suited to the stochastic nature of these signals, while contrastive approaches rely on positive-pair definitions despite the semantic invariances of physiological signals being poorly understood. In this work, we show that next-token prediction is a simple and scalable alternative. We develop Hypnos, a multi-modal sleep foundation model trained using eight different sensing modalities (e.g. EEG, ECG, respiratory signals) drawn from over 20,000 overnight polysomnography recordings. We tokenize each modality into streams of discrete tokens using residual vector quantization, then train a large auto-regressive RQ-Transformer to jointly predict the next token across all modalities in parallel. After training, Hypnos can be applied to continuous streams of sensor data from any subset of supported modalities, generating embeddings for downstream tasks. Across a range of benchmarks, Hypnos significantly outperforms existing foundation models. In sleep stage classification, we match the performance of strong supervised baselines on held-out test sets whilst using $100\times$ less labelled data. Hypnos even generalises to daytime physiology, surpassing a dedicated ECG foundation model at detecting atrial fibrillation. Our results demonstrate that next-token prediction is a strong self-supervised objective for representation learning from multi-modal physiological signals.

2026-06-08T15:13:57Z Jonathan F. Carter Lionel Tarassenko http://arxiv.org/abs/2604.07349v10 Descent Before Hardness: Orbit-Gap Obstructions in Exact Certification 2026-06-08T15:01:49Z

Exact certification has a quotient: states are equivalent when they have the same correct outputs. A tractability proxy must first define a predicate on this quotient before ordinary hardness or algorithmic questions arise. Raw syntactic proxies can fail at that earlier step, because correctness-preserving presentation moves may change the statistics they inspect while preserving the exact-certification problem. Orbit gaps are the complete obstruction. An orbit gap occurs when one closure orbit contains both positive and negative presentations of a target. Exact closure-invariant classification is possible if and only if the positive and negative orbit hulls are disjoint. When the hulls are disjoint, the closure hull is the least exact classifier. With computable orbit representatives, this hull classifier becomes a quotient-level algorithm. These are predicate-level results: they establish when a proxy defines a property of the certification problem at all, a precondition logically prior to class lower bounds on the resulting recovery task and deliberately not a substitute for them. The structural transfer applies to every fixed correctness relation, independent of whether that relation is polynomial-time accessible. In the direct finite-local regime, where local routing tests are computed from raw pairwise syntax, three binary-pairwise proxy families and one offset-normalization witness exhibit same-orbit disagreement. Positive results arise from quotient-preserving normalizations, computable orbit catalogues whose descended predicates compose under Boolean operations, and predicates defined directly on the correctness quotient. The result complements the Rice-analog line of Borchert, Stephan, Hemaspaandra, and Rothe. All numbered results are mechanized in Lean 4; the supplementary ledger maps each claim to its formal identifier.

2026-04-08T17:59:47Z Main PDF: 46 pages, 5 tables. Supplementary: 17 pages, 2 tables. Lean 4 formalization available at https://doi.org/10.5281/zenodo.19457896 Tristan Simas http://arxiv.org/abs/2606.09589v1 I Was Scrolling and Then I Saw a Pregnant Strawberry 2026-06-08T15:01:11Z

AI minidramas (also known as fruit dramas) are short, algorithmically distributed generative AI video series featuring anthropomorphized characters that have recently emerged as a widespread phenomenon on social media platforms. This paper argues that despite their seemingly innocuous aesthetic, these videos reproduce deeply gendered narrative structures in which female characters are systematically associated with moral transgression, sexual betrayal, and reproductive capacity, and that several plots also encode the logic of racialization, i.e., the process by which visible bodily difference is morally loaded. Drawing on feminist film theory, critical race theory, and platform studies, it further argues that the generative AI aesthetic of these videos, characterized by softness, roundness, and visual cuteness, functions as a mechanism of aesthetic laundering, neutralizing the ideological weight of these narratives and enabling their circulation despite content moderation systems. This paper approaches these questions through personal observation and close reading, reflecting on the specific affordances of generative AI that make this phenomenon both possible and culturally consequential for the field of computational creativity.

2026-06-08T15:01:11Z Piera Riccio http://arxiv.org/abs/2606.09587v1 Seeing the Hivemind: A Consensus-Aware Interaction Technique for Mitigating AI Homogenization 2026-06-08T14:59:50Z

People are increasingly using AI for creative tasks such as writing. While adoption continues to grow, this form of use risks undermining individual creativity locally and reducing the heterogeneity of creative output at scale. In response, we introduce the Semantic Repulsion Technique (SRT) and evaluate it both computationally and through a study with 16 participants who regularly use AI for creative tasks. Our computational assessment reveals that SRT increases semantic diversity by 85--167\% while reducing consensus phrases by 43--95\% across task modes. In the user study, SRT outputs received higher usefulness ($p = .019$, $W = .208$) and coherence ratings ( $p = .006$, $W = .260$); 68.8\% of participants were willing to use SRT-Strong for multiple tasks versus 18.8\% for baselines. Originality and coherence ratings were positively correlated across all systems ($ρ= +.40$ to $+.67$), suggesting that divergence need not compromise readability. Taken together, these preliminary findings can inform the design of AI systems that aim to support everyday creativity without contributing to homogenization.

2026-06-08T14:59:50Z In review Muhammad Haris Khan Joel wester http://arxiv.org/abs/2606.09585v1 Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text 2026-06-08T14:58:59Z

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both textual rationales and visual evidence. In this work, we propose a bolder and more ambitious idea: could images alone serve as the reasoning medium for both language and multimodal tasks? To explore this, we propose optical reasoning, which treats images as a standalone reasoning medium. We instantiate this concept with two variants: typographic-based optical reasoning, which optimizes visual layouts for compact rationale rendering, and graphical-based optical reasoning, which composes text and graphical elements into structured visual rationales. Across mathematical, scientific, and interleaved-modal reasoning benchmarks, optical reasoning can match or even exceed traditional text reasoning while reducing reasoning tokens by an average of 28.57% on language tasks and 16% on multimodal tasks, achieving 1.96 times the token efficiency of text reasoning. These results show that images can effectively and efficiently encode rationales while providing a unified visual canvas for reasoning.

2026-06-08T14:58:59Z Yutong Bian Dongjie Cheng Heming Xia Yongqi Li Wenjie Li http://arxiv.org/abs/2603.21563v4 Counterfactual Credit Policy Optimization for Multi-Agent Collaboration 2026-06-08T14:57:56Z

Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles, but reinforcement learning for such systems is limited by credit assignment: shared terminal rewards obscure individual contributions and can encourage free-riding. We introduce Collaborative Credit Policy Optimization (CCPO), an optimizer-agnostic credit assignment layer that converts team-level outcomes into agent-specific learning signals. CCPO provides two complementary allocators. Counterfactual credit estimates an agent's marginal contribution by comparing the realized team outcome with a counterfactual outcome where that agent is removed. Verifier-anchored LLM self-evaluation is an exploratory allocator that uses constrained self- and peer-evaluations to redistribute credit while keeping the external verifier outcome dominant. The resulting role-specific rewards can be consumed by GRPO-style updates or other policy-gradient optimizers such as GSPO and REINFORCE++. We instantiate CCPO in a sequential Think--Solve setting and evaluate it on mathematical reasoning benchmarks. Results show that explicit credit assignment often improves dual-agent reasoning, especially on MATH500 and several out-of-distribution settings, while gains vary across models and datasets.

2026-03-23T04:35:02Z Zhongyi Li Wan Tian Yikun Ban Jinju Chen Huiming Zhang Yang Liu Fuzhen Zhuang http://arxiv.org/abs/2606.09578v1 TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs 2026-06-08T14:52:46Z

Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly evaluated on table reasoning tasks, but the role of table representation remains under-explored. In practice, the same table content may appear in different structural formats, such as HTML, Markdown, and LaTeX, or as rendered images. However, existing evaluations often let content, format, layout, and modality vary together, making it difficult to isolate representation effects. We introduce TABVERSE, a controlled multimodal table benchmark that aligns the same table content across multiple structural formats and rendered images, with question category and difficulty tags. This design enables systematic evaluation of representation effects while holding table content fixed. We evaluate LLMs and VLMs across three tasks: Question Answering (QA), Structural Understanding Capability (SUC), and Structure Reconstruction (SR). Our results show that representation choice substantially affects table understanding. Models generally perform better with structured text than with rendered images, but the size of this gap depends on the task, model, and format. HTML is often the most robust text format, while row-sensitive structural tasks and syntactically usable LaTeX reconstruction remain challenging. These findings show that table representation is a key factor in reliable table evaluation.

2026-06-08T14:52:46Z 24 pages, 18 tables, 16 figures, Submitted to ARR May 2026 Momina Ahsan Sarfraz Ahmad Ming Shan Hee Roy Ka-Wei Lee Preslav Nakov http://arxiv.org/abs/2603.04177v2 CodeTaste: Can LLMs Generate Human-Level Code Refactorings? 2026-06-08T14:51:08Z

LLM coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human developers address such issues through refactoring: behavior-preserving program transformations that improve structure and maintainability. We investigate whether agents (i) can execute refactorings reliably and (ii) identify the refactorings that human developers actually chose in real codebases. To this end, we construct CodeTaste, a benchmark mined from large multi-file open-source refactorings. To score solutions, we combine repository test suites that measure functional correctness with tailored static checks that verify removal of undesired and introduction of desired code patterns using dataflow reasoning. Our results show a clear gap: agents perform well at implementing refactorings that are specified in detail, but often fail to discover the human refactoring choices when given a focus area for changes. A propose-then-implement decomposition improves alignment, and selecting the best-aligned proposal before implementation can yield further gains. CodeTaste provides an evaluation target and a potential preference signal for aligning coding agents with human refactoring decisions in realistic codebases. We release the benchmark, leaderboard, and code.

2026-03-04T15:34:18Z Alex Thillen Niels Mündler Veselin Raychev Martin Vechev http://arxiv.org/abs/2606.09572v1 CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control 2026-06-08T14:46:43Z

Vision-language-action models have shown strong promise for robot manipulation, yet raw language is primarily needed to specify task intent rather than to be repeatedly processed during high-frequency low-level execution. Motivated by this separation, we propose a cerebello-thalamic-inspired vision-action model (CT-VAM) for efficient task-conditioned visuomotor control. CT-VAM acts as a compact local execution policy that predicts action chunks from dualview visual observations, proprioception, and a lightweight task condition, potentially enabling a practical cloud-edge paradigm in which high-level semantic reasoning can be handled by large models while fast closed-loop control runs on local hardware. To fuse heterogeneous inputs effectively, CT-VAM introduces TARS (Thalamic Action Routing Stream), a stream-separated conditional attention decoder that independently routes action, visual and task streams, preventing dense sensory tokens from overwhelming compact task-relevant conditions. With only 68M parameters, CT-VAM achieves LIBERO success rates competitive with substantially larger VLA models, while reducing inference latency. Together with flow-consistent inpainting for asynchronous chunk execution, CT-VAM supports high-frequency control and demonstrates robust realworld deployment on resource-constrained robotic platforms.

2026-06-08T14:46:43Z Jiacheng Li Yize Guo Jiabin Guo Qingchen Liu Jiahu Qin http://arxiv.org/abs/2606.09963v1 Geometry-Aware Anisotropic Boundary Correction for Aerodynamic Simulation 2026-06-08T14:45:07Z

Aerodynamic simulation is a key component of engineering shape design, where core quantities such as the surface pressure coefficient strongly depend on flow dynamics near solid boundaries. Neural operators provide an efficient alternative to expensive Computational Fluid Dynamics (CFD) solvers. However, conventional methods treat the boundary region isotropically, failing to account for the distinct physical behaviors along the boundaries. In reality, the aerodynamic process exhibits anisotropy: along the tangential direction, flow propagates along the wall; along the normal direction, physical quantities are constrained by the wall. To explicitly model the distinct physical behaviors, we propose GeoABC, a geometry-conditioned anisotropic boundary correction framework. GeoABC leverages the boundary geometries to introduce direction-aware boundary correction into the intermediate representations of neural operators, transforming boundary geometry from static input features into a structural prior that modulates physical prediction. On 2D airfoil and 3D car tasks, GeoABC consistently adapts to multiple neural operator backbones, reducing near-boundary relative $L_2$ error by $\sim$38\% on average, narrowing the structural near-wall gap shared by mainstream neural operators, and advancing neural operators toward high-fidelity aerodynamic simulation.

2026-06-08T14:45:07Z Xin Zhang Yipeng Huang Shu Jiang Zhenzhong Wang Min Jiang http://arxiv.org/abs/2606.09568v1 Self-Explainability in Self-Adaptive and Self-Organising Systems: Status and Research Directions 2026-06-08T14:42:21Z

The growing complexity of self-adaptive and self-organising systems, fuelled by advances in Artificial Intelligence (AI), has made them increasingly difficult to understand and trust. While Explainable AI aims to provide insight into AI decision-making, a more advanced goal is for systems to explain themselves - an ability referred to as Self-Explainability (SX). This article presents a systematic literature review on SX, analysing existing approaches, including their domains, targets, and evaluation methods. The review develops a unified definition and taxonomy of SX and introduces Levels of Self-Explainability, providing a framework for positioning current and future research. Our results show that most SX approaches remain conceptual, with few practical implementations. Moreover, there is currently no formal or de facto standard for evaluating SX, highlighting a major research gap. This work thus establishes a foundation and roadmap for advancing Self-Explainability in complex systems.

2026-06-08T14:42:21Z Under review as a regular paper at ACM Transactions on Autonomous and Adaptive Systems (TAAS) Tom Beyer Svea Wisy Sven Tomforde