Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies

2026-06-06T19:55:54Z

In this work we study offline reinforcement learning (RL) under temporally extended task constraints expressed in Linear Temporal Logic over finite traces (LTLf). Recently, transformer-based approaches such as Trajectory Transformers and Decision Transformers have been adopted to address RL as a sequence modeling problem. However, these methods optimize purely for reward and do not account for high-level temporal requirements. Here, we introduce a neurosymbolic framework that injects LTLf background knowledge into such transformer-based RL policies. Our approach compiles LTLf formulas into deterministic finite automata (DFAs) and integrates them into the learning process through a differentiable representation and a logic-based loss function. In particular, we derive differentiable satisfaction signals from DFA progression and use them as a regularization term during training. The resulting method is architecture-agnostic across different models. We evaluate the proposed framework on navigation environments with specification suites covering combinations of safety and reachability temporal properties. Experimental results show that incorporating background knowledge not only improves constraint satisfaction, but also maintains competitive return compared to vanilla baselines.

TempoBench: Evaluating Temporal Causal Reasoning in Large Language Models

2026-06-06T10:28:21Z

Temporal reasoning involves understanding how systems evolve over time through input-driven state transitions. A key aspect is temporal causal reasoning, causally reasoning about what prior inputs were necessary in causing an observed outcome. While large language models (LLMs) perform well at forward simulation, predicting outputs from inputs, they struggle to identify the minimal causal inputs of outcomes. To study this distinction, we define two tasks: \textit{trace simulation} (SIM), which requires models to simulate system execution, and \textit{minimal causal attribution} (MIN), which identifies the minimal set of inputs necessary for a given outcome. We introduce \textsc{TempoBench}, the first formally verified benchmark for temporal causal reasoning, built from synthesized Mealy machines with controllable complexity and provably correct causal labels. Across frontier models, we observe that despite achieving up to 96\% accuracy on the SIM task, performance on the causal attribution MIN task drops below 25\%; models fail to reason about causal necessity. Over 94\% of causal errors involve overspecification, where models perform retrieval and list all possible inputs rather than reasoning about the minimal causal subset. Fine-tuning on \textsc{TempoBench} training corpus improves causal reasoning and generalizes better than math, code, or instruction training, with gains across standard reasoning benchmarks.

Persistent Permutability in Choice Petri Nets

2026-06-05T17:58:17Z

Persistence is a strong, global, behavioural property of a Petri net, meaning that no activity can disable a different activity. Persistent permutability is a weaker property, pertaining to individual interleavings of a Petri net and stating that a non-persistent sequence can be permuted into a persistent one. We identify Petri net classes for which persistent permutability already suffices to imply overall persistence. These classes generalise free-choice nets and are related to Petri's concept of ``confusion'', while they are distinguished from each other by diverse restrictions on the choice structure of a net. We prove Ochmanski's conjecture to be correct for these classes.

Earliest query answering over streamed trees

2026-06-05T15:53:36Z

Streaming allows executing queries over massive JSON or XML documents whose size makes it infeasible to fully parse them into a tree. Earliest query answering is a radical approach to reducing latency and memory footprint. To minimize latency, a document node must be returned as soon as the node is guaranteed to be an answer regardless of how the document ends. Similarly, to minimize memory footprint, a node must be discarded as soon as it cannot become an answer regardless of how the document ends. For simple queries that select nodes based on the path from the root, the decision for each node can be made on the spot, but practical languages such as XPath or JSONpath support filters, which allow selecting nodes based on information collected from various parts of the document, possibly further down the stream. This makes earliest query answering a challenging task, as candidate nodes must be kept in memory until it becomes clear that they can be safely returned or discarded. We show that this can be done for all unary queries expressible in monadic second order logic (MSO), while ensuring constant update time -- provided that nodes are returned by passing a suitable iterator, rather than one by one.

An Algebraic View of the Expressivity of Recurrent Language Models

2026-06-05T15:53:13Z

What formal languages can a recurrent neural language model recognize? Formal results in the literature conflict: some authors report Turing-completeness, while others show equivalence to regular languages. The reason for this discrepancy is that the underlying arithmetic model differs. The paper develops a unified algebraic account of the expressivity of recurrent neural networks, starting with a formal account of various arithmetic models. This account reduces expressivity to an algebraic question, e.g., whether a network's syntactic monoid divides a certain wreath product. As a case study, the paper revisits diagonal state-space models: the same architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.

A Held-Out Transition-Pair Falsifier for Long-Horizon Non-Abelian State Tracking

2026-06-05T13:26:41Z

State tracking exposes a sharp limitation of sequence models: the relevant signal is often not a summary of observed tokens, but an ordered latent state that evolves through non-commutative transformations. We introduce a held-out transition-pair falsifier for finite non-Abelian group tracking. The protocol forbids selected ordered generator pairs during training and requires the same local patterns during evaluation, blocking one direct local-transition memorization pathway. In a controlled $S_3 \times S_3$ benchmark, a projected recurrent state model trained only on length-8 sequences produces error-free final-state predictions (perfect 250/250 per horizon) through evaluation horizons up to 1,048,576 tokens across five seeds. Matched native-readout baselines, including bag, GRU, and a single-configuration structured state-space model, remain near floor under the same protocol. Projection-matched GRU, structured SSM, and bag baselines equipped with analogous finite-group prototype readouts also remain near chance under the same split. Mechanism diagnostics show that hard projection coincides with low homomorphism error, low state-consistency drift, and non-trivial commutator separation, while softened projection collapses final-state accuracy. Clean-split audits verify zero verbatim reduced-word overlap and zero structural-template overlap between training and evaluation partitions. The evidence is scoped to this controlled finite-group falsifier rather than to a general architecture ranking. Within that regime, explicit projected non-commutative state composition acts as a useful inductive bias for long-horizon hidden-state tracking.

A remark on diagnosability verification

2026-06-05T08:58:59Z

We point out three inaccuracies in paper [M.V. Moreira, T.C. Jesus, and J.C. Basilio. Polynomial time verification of decentralized diagnosability of discrete event systems. IEEE Transactions on Automatic Control, 56(7):1679-1684, July 2011]. First, the authors wrongly claimed that their algorithm for verifying (co-)diagnosability of labeled finite-state automata (LFSAs) did not depend on assumptions. We give an LFSA that is not deadlock-free or divergence-free such that their algorithm cannot correctly verify its diagnosability. Because diagnosability is a special case of co-diagnosability, their algorithm cannot correctly verify co-diagnosability either when LFSAs are not deadlock-free or divergence-free. Second, they wrongly claimed that adding at each dead state an unobservable self-loop can help verifying diagnosability for an LFSA that is not deadlock-free or divergence-free, but this is wrong, because such a modification sometimes changes the diagnosability of an LFSA. Third, they wrongly claimed that their algorithm for verifying co-diagnosability ran in polynomial time. A polynomial-time algorithm unlikely exists, because the problem of verifying co-diagnosability of LFSAs is PSPACE-hard.

Passive Learning of Symbolic Automata over Monotonic Algebras

2026-06-04T11:45:39Z

Symbolic automata extend classical finite-state automata to handle large or infinite alphabets by labeling transitions by predicates coming from a boolean algebra. Many results from automata theory have been lifted to this model, and it has proved its usefulness for example in multiple software verification applications. Here, we tackle the passive learning problem of identification in the limit, i.e. learning a model from a sample without access to an oracle to query. We provide an algorithm, SAI, that efficiently identifies in the limit symbolic automata over any monotonic algebra where predicates labeling transitions are of the form a <= x < b. The algorithm extends the RPNI framework for passive learning of finite-state automata to symbolic automata thanks to a new splitting operation inspired by RTI, a passive learning algorithm for deterministic real-time automata, a subclass of timed automata. The learning algorithm combines merging of states and splitting of states allowing to infer the predicates on transitions in a top-down fashion. We prove that SAI admits polynomial size characteristic samples.

On the complexity of computing Strahler numbers

2026-06-03T20:16:28Z

It is shown that the problem of computing the Strahler number of a binary tree given as a term is complete for the circuit complexity class uniform $\mathsf{NC}^1$. For several variants, where the binary tree is given by a pointer structure or in a succinct form by a directed acyclic graph or a tree straight-line program, the complexity of computing the Strahler number is determined as well. We show that the problem of deciding whether a given context-free grammar in Chomsky normal form produces a derivation tree with a Strahler number of at least $k$ is $\mathsf{P}$-complete. If the derivation tree is restricted to be acyclic, the problem becomes $\mathsf{PSPACE}$-complete.

Synchronization of strongly connected partial DFAs and prefix codes

2026-06-03T19:06:47Z

We study synchronizing partial DFAs, which extend the classical concept of synchronizing complete DFAs and are a special case of synchronizing unambiguous NFAs. A partial DFA is called synchronizing if it has a word (called a \emph{reset word}) whose action brings a non-empty subset of states to a unique state and is undefined for all other states. The class of strongly connected partial DFAs is precisely the class of DFAs recognizing the Kleene star of prefix codes. While in the general case the problem of checking whether a partial DFA is synchronizing is PSPACE-complete, we show that in the strongly connected case, this problem can be efficiently reduced to the same problem for a complete DFA. Using combinatorial, algebraic, and formal languages methods, we develop techniques that relate main synchronization problems for strongly connected partial DFAs to the same problems for complete DFAs. In particular, this includes the Černý and the rank conjectures, the problem of finding a reset word, and upper bounds on the length of the shortest reset words of literal automata of finite prefix codes. We conclude that solving fundamental synchronization problems is equally hard in both models, as an essential improvement of the results for one model implies an improvement for the other.

The memory of $ω$-regular and BC($Σ_2^0$) objectives

2026-06-03T12:00:38Z

In the context of 2-player zero-sum infinite-duration games played on (potentially infinite) graphs, the memory of an objective is the smallest integer k such that in any game won by Eve, she has a strategy with <= k states of memory. For omega-regular objectives, checking whether the memory equals a given number k was not known to be decidable. In this work, we focus on objectives in BC(Sigma0^2), i.e. recognised by a potentially infinite deterministic parity automaton. We provide a class of automata that recognise objectives with memory <= k, leading to the following results: (1) For omega-regular objectives, the memory over finite and infinite games coincides and can be computed in NP. (2) Given two objectives W1 and W2 in BC(Sigma0^2) and assuming W1 is prefix-independent, the memory of W1 U W2 is at most the product of the memories of W1 and W2. Our results also apply to chromatic memory, the variant where strategies can update their memory state only depending on which colour is seen.

Half-flips are 5-avoidable

2026-06-03T11:59:12Z

A word contains a \emph{half-flip} if it contains non-empty factors $uv$ and $vu$ where $|u|=|v|$. Fici reports a non-constructive proof of the existence of an infinite word over a finite alphabet avoiding half-flips and asks for the size of the smallest alphabet over which half-flips may be avoided. Currie and Rampersad have proposed a pure morphic word over 8 letters and a morphic word over 5 letters and conjecture that they avoid half-flips. We present a pure morphic word over 5 letters that avoids half-flips. We also show that half-flips with $|u|\ge2$ are 3-avoidable and that half-flips with $|u|\ge4$ are 2-avoidable.

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

2026-06-01T21:37:50Z

We study learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks assigned to agents enables breaking down a team-level objective into simpler, smaller sub-tasks. However, existing approaches remain sample-inefficient and are limited to the single-task case, requiring retraining policies for each new task. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify challenges to the feasibility of ACC-MARL, propose solutions, and prove that our approach is optimal. We further show that learned value functions can be used to assign tasks optimally at test time. Experiments demonstrate emergent task-aware, multi-step coordination among agents, such as pressing a button to unlock a door, holding the door, and short-circuiting tasks.

Why Are Linear RNNs More Parallelizable?

2026-06-01T21:33:41Z

The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs -- but not traditional, nonlinear RNNs -- as easy to parallelize in practice as transformers. We answer this question by providing a tight connection between types of RNNs and standard complexity classes. We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit. Furthermore, we show that nonlinear RNNs can solve $\mathsf{L}$-complete problems (and even $\mathsf{P}$-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers. Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are $\mathsf{NC}^1$-complete whereas diagonal-plus-low-rank LRNNs are more expressive ($\mathsf{PNC}^1$-complete). We provide further insight by associating each type of RNN with a corresponding automata-theoretic model that it can simulate. Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.

On gapped repeats in a cyclic Fibonacci word

2026-06-01T11:00:09Z

In this article, we consider the words with cyclic indices. For given $s$, we consider the pair $(ι,κ)$ of indices such that the word of length $s$ from $ι$ is equal to the word of length $s$ from $κ$. We give a characterization of such pairs for a cyclic Fibonacci word, and give the number of them.