Pneuma-Seeker: A Relational Reification Mechanism to Align AI Agents with Human Work over Relational Data

2026-03-11T13:20:16Z

When faced with data problems, many data workers cannot articulate their information need precisely enough for software to help. Although LLMs interpret natural-language requests, they behave brittly when intent is under-specified, e.g., hallucinating fields, assuming join paths, or producing ungrounded answers. We present Pneuma-Seeker, a system built around a central idea: relational reification. Pneuma-Seeker represents a user's evolving information need as a relational schema: a concrete, analysis-ready data model shared between user and system. Rather than answering prompts directly, Pneuma-Seeker iteratively refines this schema, then discovers and prepares relevant sources to construct a relation and executable program that compute the answer. Pneuma-Seeker employs an LLM-powered agentic architecture with conductor-style planning and macro- and micro-level context management to operate effectively over heterogeneous relational corpora. We evaluate Pneuma-Seeker across multiple domains against state-of-the-art academic and industrial baselines, demonstrating higher answer accuracy. Deployment in a real organization highlights trust and inspectability as essential requirements for LLM-mediated data systems.

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

2026-03-11T12:10:03Z

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

Tursio for Credit Unions: Structured Data Search with Automated Context Graphs

2026-03-11T12:02:03Z

Extracting actionable insights from structured databases in regulated industries, such as credit unions, is often hindered by complex schemas, legacy systems, and stringent data governance requirements. We present Tursio, a secure, on-premises, database search platform that enables business users to query enterprise databases using natural language. Tursio automatically infers a context graph -- a schema-level metadata structure that captures join paths, column semantics, and domain annotations -- and uses it to systematically generate accurate query plans through LLM-assisted compilation, grounding, and rewriting. Unlike existing AI/BI tools that require extensive manual context curation, Tursio automates this end-to-end and deploys entirely on-premises. We demonstrate Tursio through realistic scenarios in the credit union domain, and discuss its applicability to other regulated settings.

A Hypergraph-Based Framework for Exploratory Business Intelligence

2026-03-11T10:36:48Z

Business Intelligence (BI) analysis is evolving towards Exploratory BI, an iterative, multi-round exploration paradigm where analysts progressively refine their understanding. However, traditional BI systems impose critical limits for Exploratory BI: heavy reliance on expert knowledge, high computational costs, static schemas, and lack of reusability. We present ExBI, a novel system that introduces the hypergraph data model with operators, including Source, Join, and View, to enable dynamic schema evolution and materialized view reuse. Using sampling-based algorithms with provable estimation guarantees, ExBI addresses the computational bottlenecks, while maintaining analytical accuracy. Experiments on LDBC datasets demonstrate that ExBI achieves significant speedups over existing systems: on average 16.21x (up to 146.25x) compared to Neo4j and 46.67x (up to 230.53x) compared to MySQL, while maintaining high accuracy with an average error rate of only 0.27% for COUNT, enabling efficient and accurate large-scale exploratory BI workflows.

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

2026-03-11T09:54:09Z

LLM-powered agents face a persistent challenge: learning from their execution experiences to improve future performance. While agents can successfully complete many tasks, they often repeat inefficient patterns, fail to recover from similar errors, and miss opportunities to apply successful strategies from past executions. We present a novel framework for automatically extracting actionable learnings from agent execution trajectories and utilizing them to improve future performance through contextual memory retrieval. Our approach comprises four components: (1) a Trajectory Intelligence Extractor that performs semantic analysis of agent reasoning patterns, (2) a Decision Attribution Analyzer that identifies which decisions and reasoning steps led to failures, recoveries, or inefficiencies, (3) a Contextual Learning Generator that produces three types of guidance -- strategy tips from successful patterns, recovery tips from failure handling, and optimization tips from inefficient but successful executions, and (4) an Adaptive Memory Retrieval System that injects relevant learnings into agent prompts based on multi-dimensional similarity. Unlike existing memory systems that store generic conversational facts, our framework understands execution patterns, extracts structured learnings with provenance, and retrieves guidance tailored to specific task contexts. Evaluation on the AppWorld benchmark demonstrates consistent improvements, with up to 14.3 percentage point gains in scenario goal completion on held-out tasks and particularly strong benefits on complex tasks (28.5~pp scenario goal improvement, a 149\% relative increase).

R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment

2026-03-11T09:28:49Z

Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of rendering quality; and second existing CG quality assessment methods cannot provide reasonable text-based explanations. To address these issues, we first identify six key perceptual dimensions of CG quality from the user perspective and construct a dataset of 3500 CG images with corresponding quality descriptions. Each description covers CG style, content, and perceived quality along the selected dimensions. Furthermore, we use a subset of the dataset to build several question-answer benchmarks based on the descriptions in order to evaluate the responses of existing Vision Language Models (VLMs). We find that current VLMs are not sufficiently accurate in judging fine-grained CG quality, but that descriptions of visually similar images can significantly improve a VLM's understanding of a given CG image. Motivated by this observation, we adopt retrieval-augmented generation and propose a two-stream retrieval framework that effectively enhances the CG quality assessment capabilities of VLMs. Experiments on several representative VLMs demonstrate that our method substantially improves their performance on CG quality assessment.

MCI-SQL: Text-to-SQL with Metadata-Complete Context and Intermediate Correction

2026-03-11T07:34:47Z

Text-to-SQL aims to translate natural language queries into SQL statements. Existing methods typically follow a pipeline of pre-processing, schema linking, candidate SQL generation, SQL alignment, and target SQL selection. However, these methods face significant challenges. First, they often struggle with column filtering during schema linking due to difficulties in comprehending raw metadata. Also, the candidate SQL generation process often suffers from reasoning errors, which limits accuracy improvements. To address these limitations, we propose a framework, called MCI-SQL, to efficiently and precisely generate SQL queries. Specifically, we assign metadata-complete contexts to each column, which significantly improves the accuracy of column filtering for schema linking. Also, for candidate SQL generation, we propose an intermediate correction mechanism that validates SQL queries and revises errors in a timely way. Moreover, we also propose effective optimizations in subsequent SQL alignment and selection phases, which further enhance the performance. Experiments on the widely-used BIRD benchmark show that MCI-SQL achieves execution accuracy of 74.45% on the development set and 76.41% on the test set, surpassing current published state-of-the-art results. In addition, we manually identify and correct 412 samples in the BIRD dataset, forming a new version named BIRD-clear, which is released together with our code on GitHub. We also evaluate our methods on BIRD-clear and find that MCI-SQL outperforms baselines by 8.47 percentage points in execution accuracy, further demonstrating the effectiveness and reliability of our framework.

Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression

2026-03-11T04:48:12Z

Spatio-temporal time series are widely used in real-world applications, including traffic prediction and weather forecasting. They are sequences of observations over extensive periods and multiple locations, naturally represented as multidimensional data. Forecasting is a central task in spatio-temporal analysis, and numerous deep learning methods have been developed to address it. However, as dataset sizes and model complexities continue to grow in practice, training deep learning models has become increasingly time- and resource-intensive. A promising solution to this challenge is dataset distillation, which synthesizes compact datasets that can effectively replace the original data for model training. Although successful in various domains, including time series analysis, existing dataset distillation methods compress only one dimension, making them less suitable for spatio-temporal datasets, where both spatial and temporal dimensions jointly contribute to the large data volume. To address this limitation, we propose STemDist, the first dataset distillation method specialized for spatio-temporal time series forecasting. A key idea of our solution is to compress both temporal and spatial dimensions in a balanced manner, reducing training time and memory. We further reduce the distillation cost by performing distillation at the cluster level rather than the individual location level, and we complement this coarse-grained approach with a subset-based granular distillation technique that enhances forecasting performance. On five real-world datasets, we show empirically that, compared to both general and time-series dataset distillation methods, datasets distilled by our STemDist method enable model training (1) faster (up to 6X) (2) more memory-efficient (up to 8X), and (3) more effective (with up to 12% lower prediction error).

Querying Everything Everywhere All at Once: Supervaluationism for the Agentic Lakehouse

2026-03-11T02:05:09Z

Agentic analytics is turning the lakehouse into a multi-version system: swarms of (human or AI) producers materialize competing pipelines in data branches, while (human or AI) consumers need answers without knowing the underlying data life-cycle. We demonstrate a new system that answers questions across branches rather than at a single snapshot. Our prototype focuses on a novel query path that evaluates queries under supervaluationary semantics. In the absence of comparable multi-branch querying capabilities in mainstream OLAP systems, we open source the demo code as a concrete baseline for the OLAP community.

HiFIVE: High-Fidelity Vector-Tile Reduction for Interactive Map Exploration

2026-03-10T23:00:37Z

Interactive visualization is a common tool for exploring large open-data repositories, where users quickly explore datasets across diverse domains. When it comes to large-scale spatial data, many existing tools rely on server-side rendering to produce small images that can be viewed at the client-side. However, most users prefer client-side rendering that allows quick styling of the data for better visualization experience. This paper presents HiFIVE, a data-management framework for scalable, high-fidelity client-side geospatial visualization. We formalize the visualization-aware tile reduction problem, which captures the trade-off between tile-size and visualization distortion, and prove its NP-hardness. HiFIVE introduces a two-stage solution combining triage and sparsification to selectively prune records, attributes, and values based on information-theoretic and spatial criteria. Experiments demonstrate substantial tile-size reductions while preserving visual fidelity and interactive performance at terabyte scale.

Direct Access for Conjunctive Queries with Negations

2026-03-10T21:24:16Z

Given a conjunctive query $Q$ and a database $D$, a direct access to the answers of $Q$ over $D$ is the operation of returning, given an index $k$, the $k$-th answer for some order on its answers. While this problem is $\#\mathcal{P}$-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries -- that is, queries having only negated atoms. In particular, we show that the class of $β$-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.

K-Join: Combining Vertex Covers for Parallel Joins

2026-03-10T19:17:11Z

Significant research effort has been devoted to improving the performance of join processing in the massively parallel computation model, where the goal is to evaluate a query with the minimum possible data transfer between machines. However, it is still an open question to determine the best possible parallel algorithm for any join query. In this paper, we present an algorithm that takes a step forward in this endeavour. Our new algorithm is simple and builds on two existing ideas: data partitioning and the HyperCube primitive. The novelty in our approach comes from a careful choice of the HyperCube shares, which is done as a linear combination of multiple vertex covers. The resulting load with input size $n$ and $p$ processors is characterized as $n/p^{1/κ}$, where $κ$ is a new hypergraph theoretic measure we call the reduced quasi vertex-cover. The new measure matches or improves on all state-of-the-art algorithms and exhibits strong similarities to the edge quasi-packing that describes the worst-case optimal load in one-round algorithms.

Expressive Power of Property Graph Constraint Languages

2026-03-10T15:36:02Z

We present the first principled and systematic study of the expressive power of property graph constraint languages, focused on the recent PG-Keys language, set to inform the upcoming revision of the GQL standard. To this end, we position PG-Keys within the broader landscape of existing formalisms. In particular, we compare PG-Keys with two core property graph constraint languages: Graph Functional Dependencies (GFD) and Graph Generating Dependencies (GGD). One hurdle is that these formalisms allow different kinds of graph pattern languages and data predicates. To make a fair comparison, based on their structural differences only, we first present a unifying framework. Within this framework, we consider conjunctive regular path queries (CRPQ) as graph patterns with equality and inequality predicates. We then identify well-behaved fragments, establish expressiveness inclusion, and prove separation results, yielding a complete and strict hierarchy of expressive power. The results identify precisely when PG-Keys provide strictly greater expressive power, clarifying their place among state-of-the-art property graph constraint formalisms.

Epistemic Closure: Autonomous Mechanism Completion for Physically Consistent Simulation

2026-03-10T14:56:36Z

The integration of Large Language Models (LLMs) into scientific discovery is currently hindered by the Implicit Context problem, where governing equations extracted from literature contain invisible thermodynamic assumptions (e.g., undrained conditions) that standard generative models fail to recognize. This leads to Physical Hallucination: the generation of syntactically correct solvers that faithfully execute physically invalid laws. Here, we introduce a Neuro-Symbolic Generative Agent that functions as a cognitive supervisor atop traditional numerical engines. By encapsulating physical laws into modular Constitutive Skills and leveraging latent intrinsic priors, the Agent employs a Chain-of-Thought reasoning workflow to autonomously validate, prune, and complete physical mechanisms. We demonstrate this capability on the challenge of thermal pressurization in low-permeability sandstone. While a standard literature-retrieval baseline erroneously predicts catastrophic material failure by blindly adopting a rigid "undrained" simplification, our Agent autonomously identifies the system as operating in a drained regime (Deborah number De << 1) via dimensionless scaling analysis. Consequently, it inductively completes the missing dissipation mechanism (Darcy flow) required to satisfy boundary constraints, predicting a stable stress path consistent with experimental reality. This work establishes a paradigm where AI agents transcend the role of coding assistants to act as epistemic partners, capable of reasoning about and correcting the theoretical assumptions embedded in scientific data.

Local Stability of Rankings

2026-03-10T14:31:17Z

Rankings play a crucial role in decision-making. However, if minor changes to items significantly alter their rankings, the quality of the decisions being made can be compromised. The stability of ranking is a measure used to assess how modifications to the ranking algorithm or data affect results. While previous work has focused on stability of the ranking under changes to the algorithm, we introduce a novel measure we refer to as local stability. Local stability indicates the effect of minor changes to the values of an item in the ranking on its rank. Our proposed definition furthermore takes into account the presence of multiple items with similar qualities in the ranking, called dense regions, permitting minor modifications to swap the positions of items within the region. We show that computing this measure in general is hard, and in turn propose a relaxation of the definition to admit approximation. We present (i) LStability, a sampling-based algorithm for approximating local stability, on which we make probably-approximately-correct-type guarantees through the use of concentration inequalities, and (ii) Detect-Dense-Region, an algorithm based on this approach to detect the dense region an item lies in, if it exists. We introduce a number of optimizations to our algorithms to improve their scalability and efficiency. We validate our proposed framework through an extensive suite of experiments, including case studies highlighting the utility of our definitions.