https://arxiv.org/api/5uRHTKzotm9qBNzq6LdEU9UxyfI 2026-06-18T00:24:14Z 9346 825 15 http://arxiv.org/abs/2602.19916v1 Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting 2026-02-23T14:55:31Z

Due to the real-time rendering performance, 3D Gaussian Splatting (3DGS) has emerged as the leading method for radiance field reconstruction. However, its reliance on spherical harmonics for color encoding inherently limits its ability to separate diffuse and specular components, making it challenging to accurately represent complex reflections. To address this, we propose a novel enhanced Gaussian kernel that explicitly models specular effects through view-dependent opacity. Meanwhile, we introduce an error-driven compensation strategy to improve rendering quality in existing 3DGS scenes. Our method begins with 2D Gaussian initialization and then adaptively inserts and optimizes enhanced Gaussian kernels, ultimately producing an augmented radiance field. Experiments demonstrate that our method not only surpasses state-of-the-art NeRF methods in rendering performance but also achieves greater parameter efficiency. Project page at: https://xiaoxinyyx.github.io/augs.

2026-02-23T14:55:31Z Accepted to ICLR 2026. Project page: \url{https://xiaoxinyyx.github.io/augs} Yixin Yang Bojian Wu Yang Zhou Hui Huang http://arxiv.org/abs/2603.29855v1 PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation 2026-02-23T14:35:30Z

Recent advancements in the text-rendering capabilities of image generation models have made the end-to-end creation of graphic design content, such as posters, increasingly feasible. However, existing reward models fall short of accurately assessing design quality, as they primarily focus on global image aesthetics while overlooking the critical dimensions of typography and layout. Furthermore, the scarcity of domain-specific preference data remains a significant bottleneck, which limits the further development of graphic design evaluation and generation. To bridge this gap, we introduce an automated pipeline to construct a high-quality dataset of 70k poster preferences by leveraging the consensus of multiple Multi-modal Large Language Models (MLLMs) to simulate human-like judgment. Utilizing this dataset, we develop PosterReward, a reward model specifically designed for high-precision poster assessment through a cascaded, multi-stage training strategy. We also provide multiple variants of the model to cater to different application scenarios. Finally, we introduce PosterRewardBench and PosterBench to evaluate the performance of existing reward models in poster assessment and the generation capabilities of current text-to-image models in poster creation, respectively.

2026-02-23T14:35:30Z Accepted by CVPR'26 Jianyu Lai Sixiang Chen Jialin Gao Hengyu Shi Zhongying Liu Fuxiang Zhai Junfeng Luo Xiaoming Wei Lujia Wang Lei Zhu http://arxiv.org/abs/2602.19753v1 RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing 2026-02-23T12:02:03Z

3D Gaussian Splatting (3DGS) has emerged as a leading technology for high-quality 3D scene reconstruction. However, the iterative refinement and densification process leads to the generation of a large number of primitives, each contributing to the reconstruction to a substantially different extent. Estimating primitive importance is thus crucial, both for removing redundancy during reconstruction and for enabling efficient compression and transmission. Existing methods typically rely on rendering-based analyses, where each primitive is evaluated through its contribution across multiple camera viewpoints. However, such methods are sensitive to the number and selection of views, rely on specialized differentiable rasterizers, and have long calculation times that grow linearly with view count, making them difficult to integrate as plug-and-play modules and limiting scalability and generalization. To address these issues, we propose RAP, a fast feedforward rendering-free attribute-guided method for efficient importance score prediction in 3DGS. RAP infers primitive significance directly from intrinsic Gaussian attributes and local neighborhood statistics, avoiding rendering-based or visibility-dependent computations. A compact MLP predicts per-primitive importance scores using rendering loss, pruning-aware loss, and significance distribution regularization. After training on a small set of scenes, RAP generalizes effectively to unseen data and can be seamlessly integrated into reconstruction, compression, and transmission pipelines. Our code is publicly available at https://github.com/yyyykf/RAP.

2026-02-23T12:02:03Z Accepted by CVPR 2026 Kaifa Yang Qi Yang Yiling Xu Zhu Li http://arxiv.org/abs/2602.17690v2 DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation 2026-02-23T11:36:42Z

Graphic design generation demands a delicate balance between high visual fidelity and fine-grained structural editability. However, existing approaches typically bifurcate into either non-editable raster image synthesis or abstract layout generation devoid of visual content. Recent combinations of these two approaches attempt to bridge this gap but often suffer from rigid composition schemas and unresolvable visual dissonances (e.g., text-background conflicts) due to their inexpressive representation and open-loop nature. To address these challenges, we propose DesignAsCode, a novel framework that reimagines graphic design as a programmatic synthesis task using HTML/CSS. Specifically, we introduce a Plan-Implement-Reflect pipeline, incorporating a Semantic Planner to construct dynamic, variable-depth element hierarchies and a Visual-Aware Reflection mechanism that iteratively optimizes the code to rectify rendering artifacts. Extensive experiments demonstrate that DesignAsCode significantly outperforms state-of-the-art baselines in both structural validity and aesthetic quality. Furthermore, our code-native representation unlocks advanced capabilities, including automatic layout retargeting, complex document generation (e.g., resumes), and CSS-based animation. Our project page is available at https://liuziyuan1109.github.io/design-as-code/.

2026-02-06T05:10:19Z Ziyuan Liu Shizhao Sun Danqing Huang Yingdong Shi Meisheng Zhang Ji Li Jingsong Yu Jiang Bian http://arxiv.org/abs/2602.19697v1 BayesFusion-SDF: Probabilistic Signed Distance Fusion with View Planning on CPU 2026-02-23T10:44:15Z

Key part of robotics, augmented reality, and digital inspection is dense 3D reconstruction from depth observations. Traditional volumetric fusion techniques, including truncated signed distance functions (TSDF), enable efficient and deterministic geometry reconstruction; however, they depend on heuristic weighting and fail to transparently convey uncertainty in a systematic way. Recent neural implicit methods, on the other hand, get very high fidelity but usually need a lot of GPU power for optimization and aren't very easy to understand for making decisions later on. This work presents BayesFusion-SDF, a CPU-centric probabilistic signed distance fusion framework that conceptualizes geometry as a sparse Gaussian random field with a defined posterior distribution over voxel distances. First, a rough TSDF reconstruction is used to create an adaptive narrow-band domain. Then, depth observations are combined using a heteroscedastic Bayesian formulation that is solved using sparse linear algebra and preconditioned conjugate gradients. Randomized diagonal estimators are a quick way to get an idea of posterior uncertainty. This makes it possible to extract surfaces and plan the next best view while taking into account uncertainty. Tests on a controlled ablation scene and a CO3D object sequence show that the new method is more accurate geometrically than TSDF baselines and gives useful estimates of uncertainty for active sensing. The proposed formulation provides a clear and easy-to-use alternative to GPU-heavy neural reconstruction methods while still being able to be understood in a probabilistic way and acting in a predictable way. GitHub: https://mazumdarsoumya.github.io/BayesFusionSDF

2026-02-23T10:44:15Z Soumya Mazumdar Vineet Kumar Rakesh Tapas Samanta http://arxiv.org/abs/2603.29852v1 VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing 2026-02-22T10:39:14Z

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four tasks with expert human-authored annotations: the novel Sketch2SVG task (VG-Sketch); a new SVG editing dataset (VG-Edit) featuring complex, multi-step edits with higher-order primitives; Text2SVG generation (VG-Text); and SVG captioning (VG-Cap). Unlike prior benchmarks that rely on synthetic edits, VectorGym provides gold-standard human annotations that require semantic understanding and design intent. We also propose a multi-task reinforcement learning approach that jointly optimizes across all four tasks using rendering-based rewards. Our method, built on GRPO with curriculum learning, trains a Qwen3-VL 8B model that achieves state-of-the-art performance among open-source models, surpassing much larger models including Qwen3-VL 235B and matching GPT-4o. We also introduce a VLM-as-a-Judge metric for SVG generation, validated through human correlation studies. Our evaluation of frontier VLMs reveals significant performance gaps, positioning VectorGym as a rigorous framework for advancing visual code generation. VectorGym is publicly available on huggingface.co/datasets/ServiceNow/VectorGym.

2026-02-22T10:39:14Z Juan Rodriguez Haotian Zhang Abhay Puri Tianyang Zhang Rishav Pramanik Meng Lin Xiaoqing Xie Marco Terral Darsh Kaushik Aly Shariff Perouz Taslakian Spandana Gella Sai Rajeswar David Vazquez Christopher Pal Marco Pedersoli http://arxiv.org/abs/2602.19089v1 Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling 2026-02-22T08:07:28Z

Current 3D human animation methods struggle to achieve photorealism: kinematics-based approaches lack non-rigid dynamics (e.g., clothing dynamics), while methods that leverage video diffusion priors can synthesize non-rigid motion but suffer from quality artifacts and identity loss. To overcome these limitations, we present Ani3DHuman, a framework that marries kinematics-based animation with video diffusion priors. We first introduce a layered motion representation that disentangles rigid motion from residual non-rigid motion. Rigid motion is generated by a kinematic method, which then produces a coarse rendering to guide the video diffusion model in generating video sequences that restore the residual non-rigid motion. However, this restoration task, based on diffusion sampling, is highly challenging, as the initial renderings are out-of-distribution, causing standard deterministic ODE samplers to fail. Therefore, we propose a novel self-guided stochastic sampling method, which effectively addresses the out-of-distribution problem by combining stochastic sampling (for photorealistic quality) with self-guidance (for identity fidelity). These restored videos provide high-quality supervision, enabling the optimization of the residual non-rigid motion field. Extensive experiments demonstrate that \MethodName can generate photorealistic 3D human animation, outperforming existing methods. Code is available in https://github.com/qiisun/ani3dhuman.

2026-02-22T08:07:28Z CVPR 2026 Qi Sun Can Wang Jiaxiang Shang Yingchun Liu Jing Liao http://arxiv.org/abs/2411.16076v2 Geometry Distributions 2026-02-22T03:38:58Z

Neural representations of 3D data have been widely adopted across various applications, particularly in recent work leveraging coordinate-based networks to model scalar or vector fields. However, these approaches face inherent challenges, such as handling thin structures and non-watertight geometries, which limit their flexibility and accuracy. In contrast, we propose a novel geometric data representation that models geometry as distributions-a powerful representation that makes no assumptions about surface genus, connectivity, or boundary conditions. Our approach uses diffusion models with a novel network architecture to learn surface point distributions, capturing fine-grained geometric details. We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity. Additionally, we explore applications using our representation, such as textured mesh representation, neural surface compression, dynamic object modeling, and rendering, highlighting its potential to advance 3D geometric learning.

2024-11-25T04:06:48Z Accepted to ICCV 2025. For the project site, see https://1zb.github.io/GeomDist/ Biao Zhang Jing Ren Peter Wonka http://arxiv.org/abs/2602.18886v1 PhysConvex: Physics-Informed 3D Dynamic Convex Radiance Fields for Reconstruction and Simulation 2026-02-21T16:16:33Z

Reconstructing and simulating dynamic 3D scenes with both visual realism and physical consistency remains a fundamental challenge. Existing neural representations, such as NeRFs and 3DGS, excel in appearance reconstruction but struggle to capture complex material deformation and dynamics. We propose PhysConvex, a Physics-informed 3D Dynamic Convex Radiance Field that unifies visual rendering and physical simulation. PhysConvex represents deformable radiance fields using physically grounded convex primitives governed by continuum mechanics. We introduce a boundary-driven dynamic convex representation that models deformation through vertex and surface dynamics, capturing spatially adaptive, non-uniform deformation, and evolving boundaries. To efficiently simulate complex geometries and heterogeneous materials, we further develop a reduced-order convex simulation that advects dynamic convex fields using neural skinning eigenmodes as shape- and material-aware deformation bases with time-varying reduced DOFs under Newtonian dynamics. Convex dynamics also offers compact, gap-free volumetric coverage, enhancing both geometric efficiency and simulation fidelity. Experiments demonstrate that PhysConvex achieves high-fidelity reconstruction of geometry, appearance, and physical properties from videos, outperforming existing methods.

2026-02-21T16:16:33Z Dan Wang Xinrui Cui Serge Belongie Ravi Ramamoorthi http://arxiv.org/abs/2602.18752v1 Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement 2026-02-21T08:24:42Z

Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks. However, a persistent and long-standing limitation is the decline in facial identity (ID) consistency during realistic portrait editing. Due to the human eye's high sensitivity to facial features, such inconsistency significantly hinders the practical deployment of these models. Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP due to Cross-source Distribution Bias and Cross-source Feature Contamination. To address these issues, we propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration. By systematically analyzing diffusion trajectories, sampler behaviors, and attention properties, we introduce three key components: 1) Adaptive mixing strategy that aligns cross-source latent representations throughout the diffusion process. 2) Hybrid solver that disentangles source-specific identity attributes and details. 3) Attentional gating mechanism that selectively entangles visual elements. Extensive experiments show that EditedID achieves state-of-the-art performance in preserving original facial ID and edited element IP consistency. As a training-free and plug-and-play solution, it establishes a new benchmark for practical and reliable single/multi-person facial identity restoration in open-world settings, paving the way for the deployment of multimodal editing large models in real-person editing scenarios. The code is available at https://github.com/NDYBSNDY/EditedID.

2026-02-21T08:24:42Z ICLR 26 Yuran Dong Hang Dai Mang Ye http://arxiv.org/abs/2602.18319v1 Robo-Saber: Generating and Simulating Virtual Reality Players 2026-02-20T16:19:19Z

We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and movement patterns specified by input style exemplars. Robo-Saber demonstrates promise in synthesizing rich gameplay data for predictive applications and enabling a physics-based whole-body VR playtesting agent.

2026-02-20T16:19:19Z 13 pages, 15 figures. Accepted to Eurographics 2026. Project page: https://robo-saber.github.io/ Nam Hee Kim Jingjing May Liu Jaakko Lehtinen Perttu Hämäläinen James F. O'Brien Xue Bin Peng http://arxiv.org/abs/2602.18314v1 Diff2DGS: Reliable Reconstruction of Occluded Surgical Scenes via 2D Gaussian Splatting 2026-02-20T16:14:21Z

Real-time reconstruction of deformable surgical scenes is vital for advancing robotic surgery, improving surgeon guidance, and enabling automation. Recent methods achieve dense reconstructions from da Vinci robotic surgery videos, with Gaussian Splatting (GS) offering real-time performance via graphics acceleration. However, reconstruction quality in occluded regions remains limited, and depth accuracy has not been fully assessed, as benchmarks like EndoNeRF and StereoMIS lack 3D ground truth. We propose Diff2DGS, a novel two-stage framework for reliable 3D reconstruction of occluded surgical scenes. In the first stage, a diffusion-based video module with temporal priors inpaints tissue occluded by instruments with high spatial-temporal consistency. In the second stage, we adapt 2D Gaussian Splatting (2DGS) with a Learnable Deformation Model (LDM) to capture dynamic tissue deformation and anatomical geometry. We also extend evaluation beyond prior image-quality metrics by performing quantitative depth accuracy analysis on the SCARED dataset. Diff2DGS outperforms state-of-the-art approaches in both appearance and geometry, reaching 38.02 dB PSNR on EndoNeRF and 34.40 dB on StereoMIS. Furthermore, our experiments demonstrate that optimizing for image quality alone does not necessarily translate into optimal 3D reconstruction accuracy. To address this, we further optimize the depth quality of the reconstructed 3D results, ensuring more faithful geometry in addition to high-fidelity appearance.

2026-02-20T16:14:21Z This work has been submitted to the IEEE for possible publication Tianyi Song Danail Stoyanov Evangelos Mazomenos Francisco Vasconcelos http://arxiv.org/abs/2602.18312v1 Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty 2026-02-20T16:11:19Z

Reinforcement learning provides a framework for learning control policies that can reproduce diverse motions for simulated characters. However, such policies often exploit unnatural high-frequency signals that are unachievable by humans or physical robots, making them poor representations of real-world behaviors. Existing work addresses this issue by adding a reward term that penalizes a large change in actions over time. This term often requires substantial tuning efforts. We propose to use the action Jacobian penalty, which penalizes changes in action with respect to the changes in simulated state directly through auto differentiation. This effectively eliminates unrealistic high-frequency control signals without task specific tuning. While effective, the action Jacobian penalty introduces significant computational overhead when used with traditional fully connected neural network architectures. To mitigate this, we introduce a new architecture called a Linear Policy Net (LPN) that significantly reduces the computational burden for calculating the action Jacobian penalty during training. In addition, a LPN requires no parameter tuning, exhibits faster learning convergence compared to baseline methods, and can be more efficiently queried during inference time compared to a fully connected neural network. We demonstrate that a Linear Policy Net, combined with the action Jacobian penalty, is able to learn policies that generate smooth signals while solving a number of motion imitation tasks with different characteristics, including dynamic motions such as a backflip and various challenging parkour skills. Finally, we apply this approach to create policies for dynamic motions on a physical quadrupedal robot equipped with an arm.

2026-02-20T16:11:19Z Zhaoming Xie Kevin Karol Jessica Hodgins http://arxiv.org/abs/2411.19322v2 SAMa: Material-aware 3D Selection and Segmentation 2026-02-20T14:37:26Z

Decomposing 3D assets into material parts is a common task for artists, yet remains a highly manual process. In this work, we introduce Select Any Material (SAMa), a material selection approach for in-the-wild objects in arbitrary 3D representations. Building on SAM2's video prior, we construct a material-centric video dataset that extends it to the material domain. We propose an efficient way to lift the model's 2D predictions to 3D by projecting each view into an intermediary 3D point cloud using depth. Nearest-neighbor lookups between any 3D representation and this similarity point cloud allow us to efficiently reconstruct accurate selection masks over objects' surfaces that can be inspected from any view. Our method is multiview-consistent by design, alleviating the need for costly per-asset optimization, and performs optimization-free selection in seconds. SAMa outperforms several strong baselines in selection accuracy and multiview consistency and enables various compelling applications, such as replacing the diffuse-textured materials on a text-to-3D output with PBR materials or selecting and editing materials on NeRFs and 3DGS captures.

2024-11-28T18:59:02Z Project Page: https://mfischer-ucl.github.io/sama Michael Fischer Iliyan Georgiev Thibault Groueix Vladimir G. Kim Tobias Ritschel Valentin Deschaintre http://arxiv.org/abs/2602.07853v3 MPM Lite: Linear Kernels and Integration without Particles 2026-02-20T01:05:45Z

In this paper, we introduce MPM Lite, a new hybrid Lagrangian/Eulerian method that eliminates the need for particle-based quadrature at solve time. Standard MPM practices suffer from a performance bottleneck where expensive implicit solves are proportional to particle-per-cell (PPC) counts due to the the choices of particle-based quadrature and wide-stencil kernels. In contrast, MPM Lite treats particles primarily as carriers of kinematic state and material history. By conceptualizing the background Cartesian grid as a voxel hexahedral mesh, we resample particle states onto fixed-location quadrature points using efficient, compact linear kernels. This architectural shift allows force assembly and the entire time-integration process to proceed without accessing particles, making the solver complexity no longer relate to particles. At the core of our method is a novel stress transfer and stretch reconstruction strategy. To avoid non-physical averaging of deformation gradients, we resample the extensive Kirchhoff stress and derive a rotation-free deformation reference solution, which naturally supports an optimization-based incremental potential formulation. Consequently, MPM Lite can be implemented as modular resampling units coupled with an FEM-style integration module, enabling the direct use of off-the-shelf nonlinear solvers, preconditioners, and unambiguous boundary conditions. We demonstrate through extensive experiments that MPM Lite preserves the robustness and versatility of traditional MPM across diverse materials while delivering significant speedups in implicit settings and improving explicit settings at the same time. Check our project page at https://mpmlite.github.io.

2026-02-08T07:54:01Z 19 pages Xiang Feng Yunuo Chen Chang Yu Hao Su Demetri Terzopoulos Yin Yang Joe Masterjohn Alejandro Castro Chenfanfu Jiang