https://arxiv.org/api/GMjnY/dsBzc5FAW8DU+UyqN/hGQ2026-06-24T00:54:43Z9374111015http://arxiv.org/abs/2512.20550v1LLM-Based Authoring of Agent-Based Narratives through Scene Descriptions2025-12-23T17:46:15ZThis paper presents a system for procedurally generating agent-based narratives using large language models (LLMs). Users could drag and drop multiple agents and objects into a scene, with each entity automatically assigned semantic metadata describing its identity, role, and potential interactions. The scene structure is then serialized into a natural language prompt and sent to an LLM, which returns a structured string describing a sequence of actions and interactions among agents and objects. The returned string encodes who performed which actions, when, and how. A custom parser interprets this string and triggers coordinated agent behaviors, animations, and interaction modules. The system supports agent-based scenes, dynamic object manipulation, and diverse interaction types. Designed for ease of use and rapid iteration, the system enables the generation of virtual agent activity suitable for prototyping agent narratives. The performance of the developed system was evaluated using four popular lightweight LLMs. Each model's process and response time were measured under multiple complexity scenarios. The collected data were analyzed to compare consistency across the examined scenarios and to highlight the relative efficiency and suitability of each model for procedural agent-based narratives generation. The results demonstrate that LLMs can reliably translate high-level scene descriptions into executable agent-based behaviors.2025-12-23T17:46:15ZVinayak RegmiChristos Mousashttp://arxiv.org/abs/2407.06966v12Physics Oriented Mathematical Perspective for Creating Trochoids and Ellipses through the Combination Rolling and Sliding Motions of a Circle Along Another One in Forward and Backward Sliding Modes2025-12-23T10:29:16ZUsual mathematical method for creating trochoids is based on a solid rule that requires a pure rolling motion of a circle along another one. In this vision a trochoid defined as a traced path by an attached point (a non-conceptive issue) to a pure rolling circle! Except for using the restriction of pure rolling motion for a circle beside using the attached points to it, authors of this article have not found other conceptive solutions for this issue in the references of mathematics and physics. This article provides a novel conceptive solution for creating trochoids and ellipses based on combination of rolling and sliding motions of a circle along another one! Therefore, we have not to define a trochoid as a path that is swept by an attached point to a pure rolling circle along another circle! Instead, a trochoid can be defined as a path is swept by a point on the circumference of a circle that is doing rolling and sliding motions uniformly along another one! Also, this article presents two different methods to implement a mathematical simulation for a moving circle does uniform simultaneous rolling and sliding motions along another one! With the help of this innovative solution, it is possible to define ellipses and trochoids as closed plane curves that can be generated through the combination of rolling and sliding motions [ellipse is created through the combination of two co-polarized rotational motions with different commensurable angular frequencies (in two different modes!)]. This article presents a novel idea titled by Virtual Rotating Circles Technique (VRCT) that can be implemented by Mathematical Simulator Machine.2024-07-09T15:45:27ZThis paper is subjected in the field of mathematical physics and includes 19 pages and 21 figures. Through the study of this article an interesting reader would be able to deduce parametric equations of trochoids and co-centered ellipses on the basis combination of rolling and sliding motions on forward and backward sliding modesH. ArbabA. Arbabhttp://arxiv.org/abs/2512.17781v2LiteGE: Lightweight Geodesic Embedding for Efficient Geodesics Computation and Non-Isometric Shape Correspondence2025-12-23T07:59:10ZComputing geodesic distances on 3D surfaces is fundamental to many tasks in 3D vision and geometry processing, with deep connections to tasks such as shape correspondence. Recent learning-based methods achieve strong performance but rely on large 3D backbones, leading to high memory usage and latency, which limit their use in interactive or resource-constrained settings. We introduce LiteGE, a lightweight approach that constructs compact, category-aware shape descriptors by applying Principal Component Analysis (PCA) to unsigned distance field (UDFs) samples at informative voxels. This descriptor is efficient to compute and removes the need for high-capacity networks. LiteGE remains robust on sparse point clouds, supporting inputs with as few as 300 points, where prior methods fail. Extensive experiments show that LiteGE reduces memory usage and inference time by up to 300$\times$ compared to existing neural approaches. In addition, by exploiting the intrinsic relationship between geodesic distance and shape correspondence, LiteGE enables fast and accurate shape matching. Our method achieves up to 1000$\times$ speedup over state-of-the-art mesh-based approaches while maintaining comparable accuracy on non-isometric shape pairs, including evaluations on point-cloud inputs.2025-12-19T16:50:52ZProceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI-26), 2026Yohanes Yudhi AdikusumaQixing HuangYing He10.1609/aaai.v40i4.37213http://arxiv.org/abs/2512.20017v1Scaling Point-based Differentiable Rendering for Large-scale Reconstruction2025-12-23T03:17:04ZPoint-based Differentiable Rendering (PBDR) enables high-fidelity 3D scene reconstruction, but scaling PBDR to high-resolution and large scenes requires efficient distributed training systems. Existing systems are tightly coupled to a specific PBDR method. And they suffer from severe communication overhead due to poor data locality. In this paper, we present Gaian, a general distributed training system for PBDR. Gaian provides a unified API expressive enough to support existing PBDR methods, while exposing rich data-access information, which Gaian leverages to optimize locality and reduce communication. We evaluated Gaian by implementing 4 PBDR algorithms. Our implementations achieve high performance and resource efficiency: across six datasets and up to 128 GPUs, it reduces communication by up to 91% and improves training throughput by 1.50x-3.71x.2025-12-23T03:17:04Z13 pages main text, plus appendixHexu ZhaoXiaoteng LiuXiwen MinJianhao HuangYouming DengYanfei LiAng LiJinyang LiAurojit Pandahttp://arxiv.org/abs/2512.19817v1Generating the Past, Present and Future from a Motion-Blurred Image2025-12-22T19:12:33ZWe seek to answer the question: what can a motion-blurred image reveal about a scene's past, present, and future? Although motion blur obscures image details and degrades visual quality, it also encodes information about scene and camera motion during an exposure. Previous techniques leverage this information to estimate a sharp image from an input blurry one, or to predict a sequence of video frames showing what might have occurred at the moment of image capture. However, they rely on handcrafted priors or network architectures to resolve ambiguities in this inverse problem, and do not incorporate image and video priors on large-scale datasets. As such, existing methods struggle to reproduce complex scene dynamics and do not attempt to recover what occurred before or after an image was taken. Here, we introduce a new technique that repurposes a pre-trained video diffusion model trained on internet-scale datasets to recover videos revealing complex scene dynamics during the moment of capture and what might have occurred immediately into the past or future. Our approach is robust and versatile; it outperforms previous methods for this task, generalizes to challenging in-the-wild images, and supports downstream tasks such as recovering camera trajectories, object motion, and dynamic 3D scene structure. Code and data are available at https://blur2vid.github.io2025-12-22T19:12:33ZCode and data are available at https://blur2vid.github.ioACM Trans. Graph. (SIGGRAPH Asia 2025), vol. 44, no. 6, pp. 1-15, Dec. 2025SaiKiran TedlaKelly ZhuTrevor CanhamFelix TaubnerMichael S. BrownKiriakos N. KutulakosDavid B. Lindell10.1145/3763306http://arxiv.org/abs/2511.00898v2Empowering LLMs with Structural Role Inference for Zero-Shot Graph Learning2025-12-22T18:21:46ZLarge Language Models have emerged as a promising approach for graph learning due to their powerful reasoning capabilities. However, existing methods exhibit systematic performance degradation on structurally important nodes such as bridges and hubs. We identify the root cause of these limitations. Current approaches encode graph topology into static features but lack reasoning scaffolds to transform topological patterns into role-based interpretations. This limitation becomes critical in zero-shot scenarios where no training data establishes structure-semantics mappings. To address this gap, we propose DuoGLM, a training-free dual-perspective framework for structure-aware graph reasoning. The local perspective constructs relation-aware templates capturing semantic interactions between nodes and neighbors. The global perspective performs topology-to-role inference to generate functional descriptions of structural positions. These complementary perspectives provide explicit reasoning mechanisms enabling LLMs to distinguish topologically similar but semantically different nodes. Extensive experiments across eight benchmark datasets demonstrate substantial improvements. DuoGLM achieves 14.3\% accuracy gain in zero-shot node classification and 7.6\% AUC improvement in cross-domain transfer compared to existing methods. The results validate the effectiveness of explicit role reasoning for graph understanding with LLMs.2025-11-02T11:33:14ZThis submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main resultsHeng ZhangJing LiuJiajun WuHaochen YouLubin GanYuling ShiXiaodong GuZijian ZhangShuai ChenWenjun HuangJin Huanghttp://arxiv.org/abs/2511.00908v2GraphGeo: Multi-Agent Debate Framework for Visual Geo-localization with Heterogeneous Graph Neural Networks2025-12-22T18:21:18ZVisual geo-localization requires extensive geographic knowledge and sophisticated reasoning to determine image locations without GPS metadata. Traditional retrieval methods are constrained by database coverage and quality. Recent Large Vision-Language Models (LVLMs) enable direct location reasoning from image content, yet individual models struggle with diverse geographic regions and complex scenes. Existing multi-agent systems improve performance through model collaboration but treat all agent interactions uniformly. They lack mechanisms to handle conflicting predictions effectively. We propose \textbf{GraphGeo}, a multi-agent debate framework using heterogeneous graph neural networks for visual geo-localization. Our approach models diverse debate relationships through typed edges, distinguishing supportive collaboration, competitive argumentation, and knowledge transfer. We introduce a dual-level debate mechanism combining node-level refinement and edge-level argumentation modeling. A cross-level topology refinement strategy enables co-evolution between graph structure and agent representations. Experiments on multiple benchmarks demonstrate GraphGeo significantly outperforms state-of-the-art methods. Our framework transforms cognitive conflicts between agents into enhanced geo-localization accuracy through structured debate.2025-11-02T11:58:55ZThis submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main resultsHeng ZhengYuling ShiXiaodong GuHaochen YouZijian ZhangLubin GanHao ZhangWenjun HuangJin Huanghttp://arxiv.org/abs/2510.12085v2GraphShaper: Geometry-aware Alignment for Improving Transfer Learning in Text-Attributed Graphs2025-12-22T18:20:12ZGraph foundation models represent a transformative paradigm for learning transferable representations across diverse graph domains. Recent methods leverage large language models to unify graph and text modalities into a shared representation space using contrastive learning. However, systematic evaluations reveal significant performance degradation at structural boundaries where distinct topological patterns converge, with accuracy losses exceeding 20 percentage points. This issue arises from a key limitation: current methods assume all graph structures can be encoded within a single Euclidean space. In reality, tree structures require hyperbolic geometry to preserve hierarchical branching, while cyclic patterns depend on spherical geometry for closure properties. At structural boundaries, nodes experience conflicting geometric constraints that uniform encoding spaces cannot resolve. This raises a crucial challenge: \textbf{Can alignment frameworks be designed to respect the intrinsic geometric diversity of graph structures?} We introduce \textbf{GraphShaper}, a geometry-aware framework that enhances graph encoding through multi-geometric specialization. Our approach employs expert networks tailored to different geometric spaces, dynamically computing fusion weights to adaptively integrate geometric properties based on local structural characteristics. This adaptive fusion preserves structural integrity before alignment with text embeddings. Extensive experiments demonstrate that GraphShaper achieves 9.47\% accuracy improvements on citation networks and 7.63\% on social networks in zero-shot settings.2025-10-14T02:48:50ZThis submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main resultsHeng ZhangTianyi ZhangYuling ShiXiaodong GuYaomin ShenHaochen YouZijian ZhangYilei YuanJin Huanghttp://arxiv.org/abs/2510.10581v2GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search2025-12-22T18:19:37ZMulti-agent systems powered by Large Language Models excel at complex tasks through coordinated collaboration, yet they face high failure rates in multi-turn deep search scenarios. Existing temporal attribution methods struggle to accurately diagnose root causes, particularly when errors propagate across multiple agents. Attempts to automate failure attribution by analyzing action sequences remain ineffective due to their inability to account for information dependencies that span agents. This paper identifies two core challenges: \textit{(i) distinguishing symptoms from root causes in multi-agent error propagation}, and \textit{(ii) tracing information dependencies beyond temporal order}. To address these issues, we introduce \textbf{GraphTracer}, a framework that redefines failure attribution through information flow analysis. GraphTracer constructs Information Dependency Graphs (IDGs) to explicitly capture how agents reference and build on prior outputs. It localizes root causes by tracing through these dependency structures instead of relying on temporal sequences. GraphTracer also uses graph-aware synthetic data generation to target critical nodes, creating realistic failure scenarios. Evaluations on the Who\&When benchmark and integration into production systems demonstrate that GraphTracer-8B achieves up to 18.18\% higher attribution accuracy compared to state-of-the-art models and enables 4.8\% to 14.2\% performance improvements in deployed multi-agent frameworks, establishing a robust solution for multi-agent system debugging.2025-10-12T12:55:42ZThis submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main resultsHeng ZhangYuling ShiXiaodong GuHaochen YouZijian ZhangLubin GanYilei YuanJin Huanghttp://arxiv.org/abs/2511.00911v2G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning2025-12-22T18:18:58ZText-attributed graphs require models to effectively integrate both structural topology and semantic content. Recent approaches apply large language models to graphs by linearizing structures into token sequences through random walks. These methods create concise graph vocabularies to replace verbose natural language descriptions. However, they overlook a critical component that makes language expressive: grammar. In natural language, grammar assigns syntactic roles to words and defines their functions within sentences. Similarly, nodes in graphs play distinct structural roles as hubs, bridges, or peripheral members. Current graph language methods provide tokens without grammatical annotations to indicate these structural or semantic roles. This absence limits language models' ability to reason about graph topology effectively. We propose \textbf{G2rammar}, a bilingual grammar framework that explicitly encodes both structural and semantic grammar for text-attributed graphs. Structural grammar characterizes topological roles through centrality and neighborhood patterns. Semantic grammar captures content relationships through textual informativity. The framework implements two-stage learning with structural grammar pre-training followed by semantic grammar fine-tuning. Extensive experiments on real-world datasets demonstrate that G2rammar consistently outperforms competitive baselines by providing language models with the grammatical context needed to understand graph structures.2025-11-02T12:06:56ZThis submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main resultsHeng ZhengHaochen YouZijun LiuZijian ZhangLubin GanHao ZhangWenjun HuangJin Huanghttp://arxiv.org/abs/2512.19583v1Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations2025-12-22T17:08:54ZWe present a system for learning generalizable hand-object tracking controllers purely from synthetic data, without requiring any human demonstrations. Our approach makes two key contributions: (1) HOP, a Hand-Object Planner, which can synthesize diverse hand-object trajectories; and (2) HOT, a Hand-Object Tracker that bridges synthetic-to-physical transfer through reinforcement learning and interaction imitation learning, delivering a generalizable controller conditioned on target hand-object states. Our method extends to diverse object shapes and hand morphologies. Through extensive evaluations, we show that our approach enables dexterous hands to track challenging, long-horizon sequences including object re-arrangement and agile in-hand reorientation. These results represent a significant step toward scalable foundation controllers for manipulation that can learn entirely from synthetic data, breaking the data bottleneck that has long constrained progress in dexterous manipulation.2025-12-22T17:08:54ZYinhuai WangRunyi YuHok Wai TsuiXiaoyi LinHui ZhangQihan ZhaoKe FanMiao LiJie SongJingbo WangQifeng ChenPing Tanhttp://arxiv.org/abs/2512.19390v1TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation2025-12-22T13:38:11ZThe robotics field is evolving towards data-driven, end-to-end learning, inspired by multimodal large models. However, reliance on expensive real-world data limits progress. Simulators offer cost-effective alternatives, but the gap between simulation and reality challenges effective policy transfer. This paper introduces TwinAligner, a novel Real2Sim2Real system that addresses both visual and dynamic gaps. The visual alignment module achieves pixel-level alignment through SDF reconstruction and editable 3DGS rendering, while the dynamic alignment module ensures dynamic consistency by identifying rigid physics from robot-object interaction. TwinAligner improves robot learning by providing scalable data collection and establishing a trustworthy iterative cycle, accelerating algorithm development. Quantitative evaluations highlight TwinAligner's strong capabilities in visual and dynamic real-to-sim alignment. This system enables policies trained in simulation to achieve strong zero-shot generalization to the real world. The high consistency between real-world and simulated policy performance underscores TwinAligner's potential to advance scalable robot learning. Code and data will be released on https://twin-aligner.github.io2025-12-22T13:38:11ZHongwei FanHang DaiJiyao ZhangJinzhou LiQiyang YanYujie ZhaoMingju GaoJinghang WuHao TangHao Donghttp://arxiv.org/abs/2512.17440v2Four special Poncelet triangle families about the incircle2025-12-22T10:05:12ZWe describe four special families of ellipse-inscribed Poncelet triangles about the incircle which maintain certain triangle centers stationary and which also display interesting conservations.2025-12-19T10:50:17Z7 pages, 5 figuresRonaldo A. GarciaMark HelmanDan Reznikhttp://arxiv.org/abs/2504.11734v2Recent Advances in 3D Object and Scene Generation: A Survey2025-12-22T02:54:33ZIn recent years, the demand for 3D content has grown exponentially with the intelligent upgrade of interactive media, extended reality (XR), and Metaverse industries. In order to overcome the limitations of traditional manual modeling approaches, such as labor-intensive workflows and prolonged production cycles, revolutionary advances have been achieved through the convergence of novel 3D representation paradigms and artificial intelligence generative technologies. In this survey, we conduct a systematic review of the cutting-edge achievements in static 3D object and scene generation, as well as establish a comprehensive technical framework through systematic categorization. We start our analysis with mainstream 3D object representations. Subsequently, we delve into the technical pathways of 3D object generation based on four mainstream deep generative models: Variational Autoencoders, Generative Adversarial Networks, Autoregressive Models, and Diffusion Models. Regarding scene generation, we focus on three dominant paradigms: layout-guided generation, lifting based on 2D priors, and rule-driven modeling. Finally, we critically examine persistent challenges in 3D generation and propose potential research directions for future investigation. This survey aims to provide readers with a structured understanding of state-of-the-art 3D generation technologies while inspiring researchers to undertake more exploration in this domain.2025-04-16T03:22:06Z35 pages, 7 figures, 6 tables, Project page: https://github.com/xdlbw/Awesome-3D-Object-and-Scene-GenerationXiang TangRuotong LiXiaopeng Fanhttp://arxiv.org/abs/2512.18930v1LouvreSAE: Sparse Autoencoders for Interpretable and Controllable Style Transfer2025-12-22T00:36:22ZArtistic style transfer in generative models remains a significant challenge, as existing methods often introduce style only via model fine-tuning, additional adapters, or prompt engineering, all of which can be computationally expensive and may still entangle style with subject matter. In this paper, we introduce a training- and inference-light, interpretable method for representing and transferring artistic style. Our approach leverages an art-specific Sparse Autoencoder (SAE) on top of latent embeddings of generative image models. Trained on artistic data, our SAE learns an emergent, largely disentangled set of stylistic and compositional concepts, corresponding to style-related elements pertaining brushwork, texture, and color palette, as well as semantic and structural concepts. We call it LouvreSAE and use it to construct style profiles: compact, decomposable steering vectors that enable style transfer without any model updates or optimization. Unlike prior concept-based style transfer methods, our method requires no fine-tuning, no LoRA training, and no additional inference passes, enabling direct steering of artistic styles from only a few reference images. We validate our method on ArtBench10, achieving or surpassing existing methods on style evaluations (VGG Style Loss and CLIP Score Style) while being 1.7-20x faster and, critically, interpretable.2025-12-22T00:36:22ZRaina PandaDaniel FeinArpita SinghalMark FioreManeesh AgrawalaMatyas Bohacek