PhysSkin: Real-Time and Generalizable Physics-Based Animation via Self-Supervised Neural Skinning

2026-05-18T15:59:31Z

Achieving real-time physics-based animation that generalizes across diverse 3D shapes and discretizations remains a fundamental challenge. We introduce PhysSkin, a physics-informed framework that addresses this challenge. In the spirit of Linear Blend Skinning, we learn continuous skinning fields as basis functions lifting motion subspace coordinates to full-space deformation, with subspace defined by handle transformations. To generate mesh-free, discretization-agnostic, and physically consistent skinning fields that generalize well across diverse 3D shapes, PhysSkin employs a new neural skinning fields autoencoder which consists of a transformer-based encoder and a cross-attention decoder. Furthermore, we also develop a novel physics-informed self-supervised learning strategy that incorporates on-the-fly skinning-field normalization and conflict-aware gradient correction, enabling effective balancing of energy minimization, spatial smoothness, and orthogonality constraints. PhysSkin shows outstanding performance on generalizable neural skinning and enables real-time physics-based animation.

Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

2026-05-18T14:18:36Z

Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting in a principled, multi-stage pipeline. A cross-stage memory module is maintained throughout to mitigate context forgetting inherent to existing agent-based frameworks. We further introduce a dedicated benchmark for code-based 3D room synthesis, encompassing various evaluation protocols. Based on our benchmark, comprehensive comparisons against existing agent-based methods are conducted to validate the effectiveness of our proposed execution harness.

SURF: Signature-Retained Fast Video Generation

2026-05-18T13:08:31Z

The demand for high-resolution video generation is growing rapidly. However, the generation resolution is severely constrained by slow inference speeds. For instance, Wan2.1 requires over 50 minutes to generate a single 720p video. While previous works explore accelerating video generation from various aspects, most of them compromise the distinctive signatures (e.g., layout, semantic, motion) of the original model. In this work, we propose SURF, an efficient framework for generating high-resolution videos, while maximally keeping the signatures. Specifically, SURF divides video generation into two stages: First, we leverage the pretrained model to infer at optimal resolution and downsample latent to generate low-resolution previews in fast speed; then we design a Refiner to upscale the preview. In the preview stage, we identify that directly inferring a model (trained with higher resolution) on lower resolution causes severe losses in signatures. So we introduce noise reshifting, a training-free technique that mitigates this issue by conducting initial denoising steps on the original resolution and switching to low resolution in later steps. In the refine stage, we establish a mapping relationship between the preview and the high-resolution target, which significantly reduces the denoising steps. We further integrate shifting windows and carefully design the training paradigm to get a powerful and efficient Refiner. In this way, SURF enables generating high-resolution videos efficiently while maximally closer to the signatures of the given pretrained model. SURF is conceptually simple and could serve as a plug-in that is compatible with various base model and acceleration methods. For example, it achieves 12.5x speedup for generating 5-second, 16fps, 720p Wan 2.1 videos and 8.7x speedup for generating 5-second, 24fps, 720p HunyuanVideo.

3D Skew Gaussian Splatting with Any Camera Trajectory Visualization Engine

2026-05-18T12:49:53Z

While 3D Gaussian Splatting (3DGS) has revolutionized real-time photorealistic view synthesis, its fundamental reliance on symmetric Gaussian distributions introduces visual artifacts that hinder accurate spatial data exploration. Specifically, symmetric kernels struggle to capture shape and color discontinuities , which cause blurriness and primitive redundancy that mislead human perception during visual analysis. To address these visualization barriers, we introduce 3D Skew Gaussian Splatting (3DSGS), a novel framework that significantly enhances the structural fidelity and compactness of explicit scene representations. Our key insight lies in extending the standard primitive to a general Skew Gaussian counterpart. This generalized primitive inherits the highly efficient rasterization properties of standard Gaussians while gaining intrinsic asymmetric modeling capabilities. We couple this with an enhanced opacity representation to better handle complex transparency, alongside a depth-aware densification strategy that intelligently manages primitive allocation. Furthermore, to make these advancements actionable for real-world visual analytics, we re-derive the CUDA rasterization pipeline to universally support both symmetric and skew Gaussians, integrating it into a decoupled, free-camera interactive visualization engine. Extensive experiments demonstrate that 3DSGS achieves superior rendering quality and structural compactness, particularly in regions with intricate details, while maintaining the real-time frame rates necessary for fluid interactive exploration. Supplementary derivations and visual results are available at \textbf{\textit{https://3d-skew-gs.github.io/}}.

Improved Baselines with Representation Autoencoders

2026-05-18T12:42:34Z

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr^k, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EP_FID@k (epochs to reach unguided gFID <= k) as a measure of training efficiency. RAEv2 attains an EP_FID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. Code is available at https://raev2.github.io.

Dynamic Elliptical Graph Factor Models via Riemannian Optimization with Geodesic Temporal Regularization

2026-05-18T12:34:44Z

Inferring time-varying graph structures from high-dimensional nodal observations is a fundamental problem arising in neuroscience, finance, climatology, and beyond. Two intrinsic challenges govern this problem: maintaining the \emph{temporal coherence} of the latent graph across successive observation windows, and respecting the \emph{intrinsic Riemannian geometry} of the symmetric positive definite manifold on which precision matrices naturally reside, a curved space whose geodesic structure departs fundamentally from that of the ambient Euclidean space. In this paper we propose dynamic estimation on the Grassmann manifold with a factor model (\textsc{Degfm}), a novel algorithm that jointly addresses both challenges. We model the time-varying precision matrix sequence as a low-rank-plus-diagonal structure governed by a latent elliptical graph factor model, which drastically reduces the effective parameter count and enables reliable estimation in the challenging small-sample regime. Temporal coherence is enforced through a Riemannian geodesic penalty defined on the Grassmann manifold, ensuring that the estimated graph trajectory is smooth with respect to the intrinsic geometry rather than the ambient Euclidean space. To solve the resulting non-convex optimization problem over Grassmann-manifold-valued sequences subject to the LRaD constraint, we derive an efficient Riemannian gradient descent algorithm that respects the manifold structure at every iterate and rigorously establish its convergence to a stationary point. Extensive experiments on both synthetic benchmarks and real-world datasets demonstrate that \textsc{Degfm} consistently outperforms state-of-the-art baselines across all evaluation metrics, confirming the practical effectiveness of the proposed framework.

Best Segmentation Buddies for Image-Shape Correspondence

2026-05-18T10:32:48Z

Finding correspondences is a fundamental and extensively researched problem in computer vision and graphics. In this work, we examine the underexplored task of estimating segmentation-to-segmentation correspondence between images in the wild and untextured 3D shapes. This task is highly challenging due to substantial differences in appearance, geometry, and viewpoint. Our approach bridges the cross-modality gap by linking pixels in the image segment to vertices in the corresponding semantic part of the 3D shape. To achieve this, we first distill deep visual features from a 2D vision model onto the 3D shape surface, allowing for the computation of feature similarity between image pixels and shape vertices. Then, we identify Best Segmentation Buddies, vertices whose most similar image pixel lies within the image segmentation region, enabling the reliable discovery of vertices in semantically corresponding shape parts. Finally, we leverage distilled 3D features from the 2D image segmentation model to segment the shape directly in 3D, bootstrapping the correspondence process. We demonstrate the generality and robustness of our approach across a wide range of image-shape pairs, showcasing accurate and semantically meaningful correspondences. Our project page is at https://threedle.github.io/bsb/.

Functionalization via Structure Completion and Motion Rectification

2026-05-18T08:05:07Z

Acquisition and creation of 3D assets have been largely view- or appearance-driven. As a result, existing digital 3D models often lack the requisite structural components to function as intended, such as joints, supports, interiors, or interaction elements. At the same time, even human-annotated motions are frequently error-prone, leading to physically implausible behavior. We introduce object functionalization, a novel task aimed at transforming visually plausible but non-functional 3D models into functional and physically operable ones. We formulate functionalization as a graph completion problem over a new functional graph representation, where labeled nodes represent object parts, labeled edges encode functional and contact relations, and movable nodes carry motion attributes, so that structural functional deficiencies manifest as missing nodes or incorrect edges. We develop a neural Graph Functionalizer (GraFu) to complete an incomplete graph representing a non-functional 3D object. The completed graph then drives a geometry realization stage that instantiates predicted connectors and structural elements in 3D, with the compelling side effect of rectifying erroneous human-annotated and predicted motions. To support training and evaluation, focusing on furniture as a rich and challenging target category, we introduce FurFun-233, a dataset of 233 paired non-functional and functionalized furniture models. On PartNet-Mobility ("zero-shot") and HSSD test sets, our method matches state-of-the-art methods in motion prediction accuracy while substantially improving functionality in terms of collision and connectivity.

CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts

2026-05-18T04:49:40Z

Watertight remeshing aims to recover a surface that induces a globally consistent interior--exterior partition of 3D space. However, for meshes with complex topology, single-layer structures, or large missing regions, inferring such a partition from local surface geometry is inherently ambiguous. As a result, existing methods often produce surface-accurate yet volumetrically inconsistent reconstructions, e.g., closely spaced double shells. The key insight of this work is that watertight remeshing should be treated as a volumetric partitioning problem rather than a surface-level repair task. To this end, we propose CelloCut, a constructive framework that formulates watertight conversion as a binary labeling problem over a Delaunay tetrahedral partition of space. We solve this via graph-cut energy minimization with one-sided constraints that preserve proxy-supported interior evidence and weighted interface penalties that discourage unsupported newly introduced boundaries. By computing a globally consistent volumetric partition, CelloCut guarantees a strictly watertight output by construction and strongly suppresses pseudo-watertight artifacts such as double shells, even under severe topological defects. Experimental results on two newly introduced challenging benchmarks, CelloScan and CelloFill, as well as standard ModelNet10 dataset, demonstrate that CelloCut significantly outperforms state-of-the-art methods, particularly in handling complex topologies and single-layer structures, producing compact and volumetrically consistent solid reconstructions. The project page is available at https://rangeryx-66.github.io/CelloCut/.

Macrofacet Theory for Gaussian Process Statistical Surfaces

2026-05-17T22:36:57Z

We present macrofacet theory to extend microfacet theory from the micro-space to the macro-space. This is achieved by transforming surfaces into volumetric representations that preserve microfacet characteristics. Therefore, we formulate a macroscopic microfacet model using a classic exponential participating medium. Meanwhile, we observe that traditional microfacet models are equivalent to Gaussian processes by definition but ignore the correlation along the geometric normal of the macro-surface. We extend microfacet theory to address this limitation. Our formulation represents Gaussian process implicit surfaces in a statistical manner, which we refer to as Gaussian process statistical surfaces. As a result, our approach converts Gaussian process statistical surfaces into classic exponential media to render surfaces, volumes and in-betweens without realizations. This enables efficient rendering and improves performance compared to realization-based approaches, while theoretically bridging microfacet models and Gaussian processes. Moreover, our approach is easy to implement.

A real time lighting technique for procedurally generated 2d isometric game terrains

2026-05-17T21:51:51Z

This work proposes an automatic real time lighting technique for procedurally generated isometric maps. The scenario is generated from a string seed and the proposed lighting system estimates the geometrical shape of the 2D objects as if they were 3D for further light interaction, therefore producing a 2.5D effect. We employ opacity maps to overcome an issue generated by the geometrical shape estimation. The solution is a coupled approach between the CPU and GPU. The produced visuals, gameplay and performance were evaluated by gamers, programmers and designers. Furthermore, the performance, in terms of frames per second, was evaluated over distinct graphics cards and processors and was satisfactory.

Fast and Reliable Gradients for Deformables Across Frictional Contact Regimes

2026-05-17T16:42:31Z

Differentiable simulation establishes the mathematical foundation for solving challenging inverse problems in computer graphics and robotics, such as physical system identification and inverse dynamics control. However, rigor in frictional contact remains the "elephant in the room." Current frameworks often avoid contact singularities via non-Markovian position approximations or heuristic gradients. This lack of mathematical consistency distorts gradients, causing optimization stagnation or failure in complex frictional contact and large-deformation scenarios. We introduce our unified fully GPU-accelerated differentiable simulator, which establishes a rigorous theoretical paradigm through: Long-Horizon Consistency: enforcing strict Markovian dynamics on a coupled position-velocity manifold to prevent gradient collapse; Unified Contact Stability: employing a mass-aligned preconditioner and soft Fischer--Burmeister operator for smooth frictional optimization; Robust Material Identification: resolving FEM singularities via a derived "Within-block Commutation" condition. Our experiments demonstrate our solver efficacy in bridging the Sim-to-Real gap, delivering precise, low-noise gradients in contact-rich tasks like dexterous manipulation and cloth folding. By mitigating the gradient instability issues common in conventional approaches, our framework significantly enhances the fidelity of physical system identification and control.

QQJ: Quantifying Qualitative Judgment for Scalable and Human-Aligned Evaluation of Generative AI

2026-05-17T10:53:43Z

The rapid progress of generative artificial intelligence has exposed fundamental limitations in existing evaluation methodologies, particularly for open-ended, creative, and human-facing tasks. Traditional automatic metrics rely on surface-level statistical similarity and often fail to reflect human perceptions of quality, while purely human evaluation, although reliable, is costly, subjective, and difficult to scale. Recent approaches using large language models as evaluators offer improved scalability but frequently lack explicit grounding in human-defined evaluation principles, leading to bias and inconsistency. In this paper, we introduce Quantifying Qualitative Judgment (QQJ), a scalable and human-centric evaluation framework that explicitly bridges the gap between human judgment and automated assessment. QQJ separates the definition of quality from its execution by anchoring evaluation in expert-designed, multi-dimensional rubrics and calibrating large language model evaluators to align with expert reasoning using a small, high-quality annotation set. This design enables consistent, interpretable, and scalable evaluation across diverse generative tasks and modalities. Extensive experiments on text and image generation demonstrate that QQJ achieves substantially stronger alignment with human judgment than traditional automatic metrics and unconstrained LLM-based evaluators. Moreover, QQJ exhibits improved stability across repeated evaluations and superior diagnostic capability in identifying critical failure modes such as hallucination and intent mismatch. These results indicate that structured qualitative judgment can be operationalized at scale without sacrificing interpretability or human alignment, positioning QQJ as a practical foundation for reliable evaluation of modern generative AI systems.

Generative and isoparametric geometric modeling of large-scale and multiscale microstructures

2026-05-17T08:39:42Z

As additive manufacturing advances toward higher printing resolution and larger build volumes, microstructures can be designed with finer geometric features over larger physical domains. This trend poses a fundamental challenge for geometric modeling: massive geometric details must be represented compactly, while their associations across scales must be maintained consistently.Existing methods cannot scale well to this requirement. Explicit representations suffer from prohibitive memory cost, and implicit representations remain compact only when microstructures admit analytic, periodic, or otherwise concise procedural descriptions. This paper proposes a new geometric modeling method that treats microstructure modeling as an on-demand generative process, rather than requiring the full instantiation of all geometric details. We first develop ExVCC, an extended volumetric Catmull-Clark spline representation that enables local spline refinement to go beyond tensor-product topology. Built on ExVCC, we introduce new shape-coding schemes and refinement rules that compactly encode large-scale geometric details and enable their localized evaluation through on-demand hierarchical refinement. To model geometric details across scales, we further propose an isoparametric representation in which details across scales are defined over a shared parametric domain using the same family of spline bases of ExVCC. This formulation turns the ExVCC's spline refinement hierarchy into a common framework for geometry encoding, on-demand generation, and cross-scale association, allowing geometric modifications to propagate automatically across scales. The effectiveness of the proposed method is demonstrated through a series of examples and comparisons.

Spherical Geometrical Bases of Spherical Origami

2026-05-17T08:06:18Z

This paper establishes a rigorous geometrical framework for spherical origami, origami using spherical sheets based on spherical geometry. Two settings are treated: origami restricted to the unit sphere ($\mathbb{S}^2$), and three-dimensional folding of spherical sheets in space. For origami on $\mathbb{S}^2$, the definitions of Euclidean origami are systematically extended to the spherical setting, and all seven Huzita--Justin axioms are shown to admit explicit equations in spherical geometry. For three-dimensional folding, equidistant curves are introduced as fold curves, replacing geodesics and enabling a richer family of folds. The framework is validated by successfully constructing computer graphics of spherical origami birds, demonstrating both the theoretical completeness and practical utility of the proposed approach.