https://arxiv.org/api/jXvusAmcjhNzg5Ihhg7QM3aMU7Y 2026-06-14T21:14:23Z 9323 555 15 http://arxiv.org/abs/2511.23292v3 FACT-GS: Frequency-Aligned Complexity-Aware Texture Reparameterization for 2D Gaussian Splatting 2026-04-02T18:33:46Z

Realistic scene appearance modeling has advanced rapidly with Gaussian Splatting, which enables real-time, high-quality rendering. Recent advances introduced per-primitive textures that incorporate spatial color variations within each Gaussian, improving their expressiveness. However, texture-based Gaussians parameterize appearance with a uniform per-Gaussian sampling grid, allocating equal sampling density regardless of local visual complexity, which leads to inefficient texture space utilization. We introduce FACT-GS, a Frequency-Aligned Complexity-aware Texture Gaussian Splatting framework that allocates texture sampling density according to local visual frequency. Grounded in adaptive sampling theory, FACT-GS reformulates texture parameterization as a differentiable sampling-density allocation problem, replacing the uniform textures with a learnable frequency-aware allocation strategy implemented via a deformation field whose Jacobian modulates local sampling density. Built on 2D Gaussian Splatting, FACT-GS performs non-uniform sampling on fixed-resolution texture grids, preserving real-time performance while recovering sharper high-frequency details under the same parameter budget.

2025-11-28T15:47:29Z 11 pages, 6 figures, CVPR 2026 Findings track. Project page: https://tianhaoxie.github.io/project/FACT-GS/ Tianhao Xie Linlian Jiang Xinxin Zuo Yang Wang Tiberiu Popa http://arxiv.org/abs/2603.29585v2 Learn2Fold: Structured Origami Generation with World Model Planning 2026-04-02T18:28:34Z

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

2026-02-02T11:24:19Z 9 pages, 6 figures Yanjia Huang Yunuo Chen Ying Jiang Jinru Han Zhengzhong Tu Yin Yang Chenfanfu Jiang http://arxiv.org/abs/2604.02141v1 Topology-First B-Rep Meshing 2026-04-02T15:13:09Z

Parametric boundary representation models (B-Reps) are the de facto standard in CAD, graphics, and robotics, yet converting them into valid meshes remains fragile. The difficulty originates from the unavoidable approximation of high-order surface and curve intersections to low-order primitives: the resulting geometric realization often fails to respect the exact topology encoded in the B-Rep, producing meshes with incorrect or missing adjacencies. Existing meshing pipelines address these inconsistencies through heuristic feature-merging and repair strategies that offer no topological guarantees and frequently fail on complex models. We propose a fundamentally different approach: the B-Rep topology is treated as an invariant of the meshing process. Our algorithm enforces the exact B-Rep topology while allowing a single user-defined tolerance to control the deviation of the mesh from the underlying parametric surfaces. Consequently, for any admissible tolerance, the output mesh is topologically correct; only its geometric fidelity degrades as the tolerance increases. This decoupling eliminates the need for post-hoc repairs and yields robust meshes even when the underlying geometry is inconsistent or highly approximated. We evaluate our method on thousands of real-world CAD models from the ABC and Fusion 360 repositories, including instances that fail with standard meshing tools. The results demonstrate that topological guarantees at the algorithmic level enable reliable mesh generation suitable for downstream applications.

2026-04-02T15:13:09Z YunFan Zhou Daniel Zint Nafiseh Izadyar Michael Tao Daniele Panozzo Teseo Schneider http://arxiv.org/abs/2603.29272v2 MaskAdapt: Learning Flexible Motion Adaptation via Mask-Invariant Prior for Physics-Based Characters 2026-04-02T10:53:24Z

We present MaskAdapt, a framework for flexible motion adaptation in physics-based humanoid control. The framework follows a two-stage residual learning paradigm. In the first stage, we train a mask-invariant base policy using stochastic body-part masking and a regularization term that enforces consistent action distributions across masking conditions. This yields a robust motion prior that remains stable under missing observations, anticipating later adaptation in those regions. In the second stage, a residual policy is trained atop the frozen base controller to modify only the targeted body parts while preserving the original behaviors elsewhere. We demonstrate the versatility of this design through two applications: (i) motion composition, where varying masks enable multi-part adaptation within a single sequence, and (ii) text-driven partial goal tracking, where designated body parts follow kinematic targets provided by a pre-trained text-conditioned autoregressive motion generator. Through experiments, MaskAdapt demonstrates strong robustness and adaptability, producing diverse behaviors under masked observations and delivering superior targeted motion adaptation compared to prior work.

2026-03-31T05:13:47Z CVPR 2026 Soomin Park Eunseong Lee Kwang Bin Lee Sung-Hee Lee http://arxiv.org/abs/2604.01551v1 ColorGradedGaussians: Palette-Based Color Grading for 3D Gaussian Splatting via View-Space Sparse Decomposition 2026-04-02T02:54:01Z

Professional color editing requires precise control over both color (hue and saturation) and lightness, ideally through separate, independent controls. We present a real-time interactive color editing framework for 3D Gaussian Splatting (3DGS) that enables palette-based recoloring, per-palette tone curves for color-aware lightness adjustment, and accurate pixel-level constraints -- capabilities unavailable in prior palette-based 3DGS methods. Existing approaches decompose colors at the primitive level, optimizing per-Gaussian palette weights before splatting. However, sparse primitive-level weights do not guarantee sparse pixel-level decompositions after alpha-blending, causing palette edits to affect unintended regions and degrading editing quality. We address this through view-space palette decomposition, splatting weights instead of colors to optimize the observable appearance of the scene. We introduce a geometric loss using inverse barycentric coordinates to enforce consistent sparsity patterns, ensuring similar colors share similar decompositions. Our approach achieves superior editing quality compared to primitive-space methods, enabling professional color grading workflows for 3DGS scenes with real-time interaction.

2026-04-02T02:54:01Z 9 pages, 2 figure pages Cheng-Kang Ted Chao Yotam Gingold http://arxiv.org/abs/2604.01274v1 Non-Rigid 3D Shape Correspondences: From Foundations to Open Challenges and Opportunities 2026-04-01T18:00:03Z

Estimating correspondences between deformed shape instances is a long-standing problem in computer graphics; numerous applications, from texture transfer to statistical modelling, rely on recovering an accurate correspondence map. Many methods have thus been proposed to tackle this challenging problem from varying perspectives, depending on the downstream application. This state-of-the-art report is geared towards researchers, practitioners, and students seeking to understand recent trends and advances in the field. We categorise developments into three paradigms: spectral methods based on functional maps, combinatorial formulations that impose discrete constraints, and deformation-based methods that directly recover a global alignment. Each school of thought offers different advantages and disadvantages, which we discuss throughout the report. Meanwhile, we highlight the latest developments in each area and suggest new potential research directions. Finally, we provide an overview of emerging challenges and opportunities in this growing field, including the recent use of vision foundation models for zero-shot correspondence and the particularly challenging task of matching partial shapes.

2026-04-01T18:00:03Z 35 pages and 15 figures; Eurographics 2026 STAR; Project page: https://nonrigid-shape-correspondences.github.io Aleksei Zhuravlev Lennart Bastian Dongliang Cao Nafie El Amrani Paul Roetzer Viktoria Ehm Riccardo Marin Hiroki Nishizawa Shigeo Morishima Christian Theobalt Nassir Navab Daniel Cremers Florian Bernard Zorah Lähner Vladislav Golyanik http://arxiv.org/abs/2604.01082v1 ReMoGen: Real-time Human Interaction-to-Reaction Generation via Modular Learning from Diverse Data 2026-04-01T16:12:23Z

Human behaviors in real-world environments are inherently interactive, with an individual's motion shaped by surrounding agents and the scene. Such capabilities are essential for applications in virtual avatars, interactive animation, and human-robot collaboration. We target real-time human interaction-to-reaction generation, which generates the ego's future motion from dynamic multi-source cues, including others' actions, scene geometry, and optional high-level semantic inputs. This task is fundamentally challenging due to (i) limited and fragmented interaction data distributed across heterogeneous single-person, human-human, and human-scene domains, and (ii) the need to produce low-latency yet high-fidelity motion responses during continuous online interaction. To address these challenges, we propose ReMoGen (Reaction Motion Generation), a modular learning framework for real-time interaction-to-reaction generation. ReMoGen leverages a universal motion prior learned from large-scale single-person motion datasets and adapts it to target interaction domains through independently trained Meta-Interaction modules, enabling robust generalization under data-scarce and heterogeneous supervision. To support responsive online interaction, ReMoGen performs segment-level generation together with a lightweight Frame-wise Segment Refinement module that incorporates newly observed cues at the frame level, improving both responsiveness and temporal coherence without expensive full-sequence inference. Extensive experiments across human-human, human-scene, and mixed-modality interaction settings show that ReMoGen produces high-quality, coherent, and responsive reactions, while generalizing effectively across diverse interaction scenarios.

2026-04-01T16:12:23Z accepted by CVPR 2026, project page: https://4dvlab.github.io/project_page/remogen/ Yaoqin Ye Yiteng Xu Qin Sun Xinge Zhu Yujing Sun Yuexin Ma http://arxiv.org/abs/2510.05097v2 Pulp Motion: Framing-aware multimodal camera and human motion generation 2026-04-01T15:04:10Z

Treating human motion and camera trajectory generation separately overlooks a core principle of cinematography: the tight interplay between actor performance and camera work in the screen space. In this paper, we are the first to cast this task as a text-conditioned joint generation, aiming to maintain consistent on-screen framing while producing two heterogeneous, yet intrinsically linked, modalities: human motion and camera trajectories. We propose a simple, model-agnostic framework that enforces multimodal coherence via an auxiliary modality: the on-screen framing induced by projecting human joints onto the camera. This on-screen framing provides a natural and effective bridge between modalities, promoting consistency and leading to more precise joint distribution. We first design a joint autoencoder that learns a shared latent space, together with a lightweight linear transform from the human and camera latents to a framing latent. We then introduce auxiliary sampling, which exploits this linear transform to steer generation toward a coherent framing modality. To support this task, we also introduce the PulpMotion dataset, a human-motion and camera-trajectory dataset with rich captions, and high-quality human motions. Extensive experiments across DiT- and MAR-based architectures show the generality and effectiveness of our method in generating on-frame coherent human-camera motions, while also achieving gains on textual alignment for both modalities. Our qualitative results yield more cinematographically meaningful framings setting the new state of the art for this task. Code, models and data are available in our \href{https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/}{project page}.

2025-10-06T17:58:34Z Project page: https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/ Robin Courant Xi Wang David Loiseaux Marc Christie Vicky Kalogeiton http://arxiv.org/abs/2603.29590v2 Automatic Method Illustration Generation for AI Scientific Papers via Drawing Middleware Creation, Evolution, and Orchestration 2026-04-01T14:57:16Z

Method illustrations (MIs) play a crucial role in conveying the core ideas of scientific papers, yet their generation remains a labor-intensive process. Here, we take inspiration from human authors' drawing practices and correspondingly propose \textbf{FigAgent}, a novel multi-agent framework for high-quality automatic MI generation. Our FigAgent distills drawing experiences from similar components across MIs and encapsulates them into reusable drawing middlewares that can be orchestrated for MI generation, while evolving these middlewares to adapt to dynamically evolving drawing requirements. Besides, a novel Explore-and-Select drawing strategy is introduced to mimic the human-like trial-and-error manner for gradually constructing MIs with complex structures. Extensive experiments show the efficacy of our method.

2026-02-06T13:00:06Z Zhuoling Li Jiarui Zhang Ping Hu Jason Kuen Jiuxiang Gu Hossein Rahmani Jun Liu http://arxiv.org/abs/2605.20190v1 Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration 2026-04-01T14:14:09Z

Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity. In addition, we contribute an industry-aligned dataset that covers 25 component categories with executable CAD-CAE tasks to support realistic training and evaluation. Experiments show that COSMO-Agent training substantially improves small open-source LLMs for constraint-driven design, exceeding large open-source and strong closed-source models in feasibility, efficiency, and stability.

2026-04-01T14:14:09Z 8pages,3figures Liyuan Deng Shujian Deng Yongkang Chen Yongkang Dai Zhihang Zhong Linyang Li Xiao Sun Yilei Shi Huaxi Huang http://arxiv.org/abs/2510.09997v2 CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting 2026-04-01T14:09:11Z

Level of Detail (LoD) is a fundamental technique in real-time computer graphics for managing the rendering costs of complex scenes while preserving visual fidelity. Traditionally, LoD is implemented using discrete levels (DLoD), where multiple, distinct versions of a model are swapped out at different distances. This long-standing paradigm, however, suffers from two major drawbacks: it requires significant storage for multiple model copies and causes jarring visual ``popping" artifacts during transitions, degrading the user experience. We argue that the explicit, primitive-based nature of the emerging 3D Gaussian Splatting (3DGS) technique enables a more ideal paradigm: Continuous LoD (CLoD). A CLoD approach facilitates smooth, seamless quality scaling within a single, unified model, thereby circumventing the core problems of DLOD. To this end, we introduce CLoD-GS, a framework that integrates a continuous LoD mechanism directly into a 3DGS representation. Our method introduces a learnable, distance-dependent decay parameter for each Gaussian primitive, which dynamically adjusts its opacity based on viewpoint proximity. This allows for the progressive and smooth filtering of less significant primitives, effectively creating a continuous spectrum of detail within one model. To train this model to be robust across all distances, we introduce a virtual distance scaling mechanism and a novel coarse-to-fine training strategy with rendered point count regularization. Our approach not only eliminates the storage overhead and visual artifacts of discrete methods but also reduces the primitive count and memory footprint of the final model. Extensive experiments demonstrate that CLoD-GS achieves smooth, quality-scalable rendering from a single model, delivering high-fidelity results across a wide range of performance targets.

2025-10-11T03:48:11Z Accepted by ICLR 2026 poster Zhigang Cheng Mingchao Sun Yu Liu Zengye Ge Luyang Tang Mu Xu Yangyan Li Peng Pan http://arxiv.org/abs/2604.00928v1 Autoregressive Appearance Prediction for 3D Gaussian Avatars 2026-04-01T14:07:14Z

A photorealistic and immersive human avatar experience demands capturing fine, person-specific details such as cloth and hair dynamics, subtle facial expressions, and characteristic motion patterns. Achieving this requires large, high-quality datasets, which often introduce ambiguities and spurious correlations when very similar poses correspond to different appearances. Models that fit these details during training can overfit and produce unstable, abrupt appearance changes for novel poses. We propose a 3D Gaussian Splatting avatar model with a spatial MLP backbone that is conditioned on both pose and an appearance latent. The latent is learned during training by an encoder, yielding a compact representation that improves reconstruction quality and helps disambiguate pose-driven renderings. At driving time, our predictor autoregressively infers the latent, producing temporally smooth appearance evolution and improved stability. Overall, our method delivers a robust and practical path to high-fidelity, stable avatar driving.

2026-04-01T14:07:14Z Project Page: https://steimich96.github.io/AAP-3DGA/ Michael Steiner Zhang Chen Alexander Richard Vasu Agrawal Markus Steinberger Michael Zollhöfer http://arxiv.org/abs/2604.00831v1 Double-Freeform Lens Design for Angular-Spatial Control of Light Fields 2026-04-01T12:41:22Z

Precise simultaneous control of both angular and spatial light-field distributions remains a longstanding challenge in optical design, often requiring complex multi-element configurations. In this work, we propose a compact single-lens solution that achieves unified angular-spatial modulation through the co-optimization of double freeform surfaces. The problem is formulated as an extended caustic design that enforces prescribed irradiance patterns on two distinct receptive planes, where the dual-plane constraint implicitly defines the directional characteristics of the light field while preserving spatial accuracy. This framework eliminates the need for auxiliary optical components while delivering performance comparable to that of conventional multi-lens systems. Comprehensive numerical simulations verify the method's effectiveness, demonstrating accurate and stable control of both angular and spatial light-field properties. The proposed approach establishes a practical foundation for compact, high-performance optical systems and provides a promising route toward integrated angular-spatial light-field engineering.

2026-04-01T12:41:22Z Accepted to Optics Express. Project homepage: https://ustc3dv.github.io/DoubleFreeformLens Yuou Sun Bailin Deng Juyong Zhang http://arxiv.org/abs/2512.02496v2 Attention-guided reference point shifting for Gaussian-mixture-based partial point set registration 2026-04-01T10:35:42Z

This study investigates the impact of the invariance of feature vectors for partial-to-partial point set registration under translation and rotation of input point sets, particularly in the realm of techniques based on deep learning and Gaussian mixture models (GMMs). We reveal both theoretical and practical problems associated with such deep-learning-based registration methods using GMMs, with a particular focus on the limitations of DeepGMR, a pioneering study in this line, to the partial-to-partial point set registration. Our primary goal is to uncover the causes behind such methods and propose a comprehensible solution for that. To address this, we introduce an attention-based reference point shifting (ARPS) layer, which robustly identifies a common reference point of two partial point sets, thereby acquiring transformation-invariant features. The ARPS layer employs a well-studied attention module to find a common reference point rather than the overlap region. Owing to this, it significantly enhances the performance of DeepGMR and its recent variant, UGMMReg. Furthermore, these extension models outperform even prior deep learning methods using attention blocks and Transformer to extract the overlap region or common reference points. We believe these findings provide deeper insights into registration methods using deep learning and GMMs.

2025-12-02T07:38:55Z 16 pages, 9 figures, 7 tables Mizuki Kikkawa Tatsuya Yatagawa Yutaka Ohtake Hiromasa Suzuki 10.26599/CVM.2025.9450429 http://arxiv.org/abs/2605.13854v1 Contrastive Multi-Modal Hypergraph Reasoning for 3D Crowd Mesh Recovery 2026-04-01T09:39:01Z

Multi-person 3D reconstruction is pivotal for real-world interaction analysis, yet remains challenging due to severe occlusions and depth ambiguity. Current approaches typically rely on single-modality inputs, which inherently lack geometric guidance. Furthermore, these methods often reconstruct subjects in isolation, neglecting the collective group context essential for resolving ambiguities in crowded scenes. To address these limitations, we propose Contrastive Multi-modal Hypergraph Reasoning to synergize semantic, geometric, and pose cues for crowd reconstruction. We first initialize robust node representations by combining RGB features, geometric priors, and occlusion-aware incomplete poses. Additionally, we introduce a pelvis depth indicator as a global spatial anchor, aligning visual features with a metric-scale-agnostic depth ordering. Subsequently, we construct a shared-topology hypergraph that moves beyond pairwise constraints to model higher-order crowd dynamics. To improve feature fusion, we design a hypergraph-based contrastive learning scheme that jointly enhances intra-modal discriminability and enforces cross-modal orthogonality. This mechanism enables the network to propagate global context effectively, allowing it to infer missing information even under severe occlusion. Extensive experiments on the Panoptic and GigaCrowd benchmarks confirm that our method achieves new state-of-the-art performance. Code and pre-trained models are available at https://github.com/SunMH-try/CoMHR.

2026-04-01T09:39:01Z ICME 2026 Minghao Sun Chongyang Xu Yitao Xie Buzhen Huang Kun Li