https://arxiv.org/api/AwR4VAwF7Gxgkx0t6DpbHGaB/ng2026-06-09T21:33:56Z93011515http://arxiv.org/abs/2606.08469v1OctaOctree Neural Radiosity for Real-time Glossy Material Rendering2026-06-07T06:25:23ZModeling high-frequency outgoing radiance distributions remains a fundamental challenge in global illumination, especially for glossy and specular materials. Existing neural-based radiance caching methods commonly rely on positional feature encodings or spatially organized caches, which makes it difficult to represent sharp directional radiance variations without increasing the model complexity or sampling cost. To address this challenge, we propose OctaOctree, an efficient spatial-angular radiance representation for global illumination. OctaOctree organizes outgoing radiance with an adaptive octree in 3D space, and associates each spatial node with an octahedral directional map. By coupling the spatial hierarchy with direction-dependent storage, our representation allocates fine spatial resolution to local illumination and visibility changes, while using coarser spatial levels with richer angular resolution to capture glossy and specular radiance distributions. This design embeds a reflectance-aware spatial-angular prior directly into the radiance representation, reducing the burden on neural networks or reconstruction modules to recover high-frequency view-dependent effects from positional features alone. As a result, OctaOctree provides a compact and expressive neural encoding for a wide range of indirect illumination effects, from diffuse interreflection to sharp glossy reflections. Experiments demonstrate that our method produces high-quality, direction-aware global illumination with single network query at primary intersections, achieving improved fidelity and real-time performance compared with baseline neural radiosity and radiance caching approaches.2026-06-07T06:25:23Z11 pages, 9 figuresJierui RenPeking UniversityHaojie JinPeking UniversityBo PangPeking UniversityMeng GaiPeking UniversityFei ZhuPeking UniversityYisong ChenPeking UniversitySheng LiPeking Universityhttp://arxiv.org/abs/2601.18585v2GimmBO: Interactive Generative Image Model Merging via Bayesian Optimization2026-06-06T22:11:28ZFine-tuning-based adaptation is widely used to customize diffusion-based image generation, leading to large collections of community-created adapters that capture diverse subjects and styles. Adapters derived from the same base model can be merged with weights, enabling the synthesis of new visual results within a vast and continuous design space. To explore this space, current workflows rely on manual slider-based tuning, an approach that scales poorly and makes weight selection difficult, even when the candidate set is limited to 20-30 adapters. We propose GimmBO to support interactive exploration of adapter merging for image generation through Preferential Bayesian Optimization (PBO). Motivated by observations from real-world usage, including sparsity and constrained weight ranges, we introduce a two-stage BO backend that improves sampling efficiency and convergence in high-dimensional spaces. We evaluate our approach with simulated users and a user study, demonstrating improved convergence, high success rates, and consistent gains over BO and line-search baselines, and further show the flexibility of the framework through several extensions.2026-01-26T15:32:16ZAccepted at SIGGRAPH NA 2026Chenxi LiuSelena LingAlec Jacobsonhttp://arxiv.org/abs/2508.08572v2From Hanging to Standing: Fabric-Formed Catenary Arches as Scalable Concrete Building Components2026-06-06T20:45:36ZConcrete is the most widely used construction material globally. Despite its versatility, it is typically poured into stiff, rectilinear formwork that restricts formal exploration and leads to considerable material waste and higher carbon output. Fabric formwork offers an alternative in which flexible textiles shape fresh concrete into structurally efficient geometries such as thin shells and catenary arches. However, a persistent challenge remains that forms optimized in tension under gravity often crack when rotated into their final compression orientation. Previous research has focused on form-finding and fabrication workflows, with little attention to damage-free reorientation. This paper addresses this gap through two contributions: a CNC-milled repositionable frame with soft-to-rigid connection details enabling controlled tilt-up reorientation without damage, and a scalar reframing that embeds small repeating catenary units within larger building components such as walls and slabs. The research pursues three objectives: (1) to design and refine compatible textile-concrete combinations, with particular focus on non-woven geotextiles; (2) to develop a CNC-cut, repositionable frame system that redistributes stresses during reorientation; and (3) to devise robust soft-to-rigid connection details that permit safe demolding and handling. Through material testing and iterative prototyping, the study identifies concrete paste-geotextile pairings that produce high-quality surface finishes. A tilt-up method was developed where the frame rotates with the arch, minimizing tensile stress. Results demonstrate that catenary arches can be cast, released, and reoriented without cracking or damage. These findings advance fabric-formed concrete toward low-tech, materially efficient structures with reduced environmental impact.2025-08-12T02:16:22ZAysan MokhtarimousaviVivian NguyenFarzad Saeidi SametLavender Tessmerhttp://arxiv.org/abs/2606.08258v1MS-COOT: Comparing Morse-Smale Complexes with Co-Optimal Transport2026-06-06T17:07:13ZUnderstanding and comparing structures in scalar fields is a central challenge in scientific visualization, with applications ranging from feature analysis to temporal and structural comparison. The Morse-Smale (MS) complex provides a natural representation by decomposing a scalar field into regions induced by gradient flow. However, existing approaches typically rely on graph-based representations, capturing relationships between critical points while discarding region-level structure. In this work, we represent the MS complex as a hypergraph, where critical points form nodes and regions define hyperedges. We introduce MS-COOT, a co-optimal transport distance that jointly computes correspondences between critical points and regions. This formulation enables explicit region-to-region matching within a distance-based framework, allowing identification of region-level events such as splitting and merging. We instantiate this framework with domain-specific components, including a hypernetwork function encoding critical point-region relationships, persistence-based probability measures that emphasize topologically significant features, and a sample cost term that incorporates critical point attributes. We evaluate MS-COOT on five datasets spanning 2D simulations, 3D surface meshes, and volumetric data. Our results show that MS-COOT captures region-level structural changes that are not reflected by graph-based distances, while achieving strong performance in downstream tasks such as classification and resolution discrimination.2026-06-06T17:07:13ZGuangyu MengMingzhe LiErin Wolf Chambershttp://arxiv.org/abs/2312.15946v3EnchantDance: Unveiling the Potential of Music-Driven Dance Movement2026-06-06T10:48:53ZThe task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency.2023-12-26T08:19:10ZProject Page: https://fluide1022.github.io/EnchantDance/Bo HanTeng ZhangZeyu LingFeilin Hanhttp://arxiv.org/abs/2508.07011v5HiMat: DiT-based Ultra-High Resolution SVBRDF Generation2026-06-06T08:53:22ZCreating ultra-high-resolution spatially varying bidirectional reflectance functions (SVBRDFs) is critical for photorealistic 3D content creation, to faithfully represent fine-scale surface details required for close-up rendering. However, achieving 4K generation faces two key challenges: (1) the need to synthesize multiple reflectance maps at full resolution, which multiplies the pixel budget and imposes prohibitive memory and computational cost, and (2) the requirement to maintain strong pixel-level alignment across maps at 4K, which is particularly difficult when adapting pretrained models designed for the RGB image domain. We introduce HiMat, a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation. To address the first challenge, HiMat performs generation in a high-compression latent space via DC-AE, and employs a pretrained diffusion transformer with linear attention to improve per-map efficiency. To address the second challenge, we propose CrossStitch, a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention. Our experiments show that HiMat achieves high-fidelity 4K SVBRDF generation with superior efficiency, structural consistency, and diversity compared to prior methods. Beyond materials, our framework also generalizes to related applications such as intrinsic decomposition.2025-08-09T15:16:58ZZixiong WangJian YangYiwei HuMilos HasanBeibei Wanghttp://arxiv.org/abs/2606.08043v1OmniFaceRig: Fully Automatic Inner-Mouth-Aware Face Rigging Across Diverse 3D Character Topologies2026-06-06T08:06:01ZFacial rigging - creating FACS-based blendshapes together with inner-mouth geometry (teeth, gums, and tongue) - remains a major bottleneck in 3D character production. Existing pipelines still require substantial designer effort, especially for manual landmark annotation, per-character template adjustment, and inner-mouth placement. We present OmniFaceRig, a fully automatic end-to-end pipeline that converts a static surface-only 3D character mesh, with no pre-modeled oral cavity, into an inner-mouth-aware FACS rig with up to 155 blendshapes, procedurally fitted teeth, gums, and tongue, and re-packed UV/texture. OmniFaceRig supports diverse topologies - humans, humanoids, long-muzzled animals (e.g., dogs, wolves, foxes), and short-muzzled animals (e.g., cats, bears, rabbits, tigers) - with no manual landmarks, no user-provided templates, and no per-asset setup. The pipeline combines hybrid VLM+CV riggability checking, multi-model face parsing, dense keypoint-driven template registration, procedural inner-mouth construction, and collision-aware blendshape transfer. For non-human characters, OmniFaceRig selects topology-specific face and inner-mouth templates and uses collision-aware inner-mouth fitting to reduce teeth-face intersections without exposing users to category-specific tuning. We also publicly release Omni-Bench, a freely available benchmark dataset of 1,000 biped 3D characters with FACS facial blendshapes and inner-mouth geometry, spanning humans, humanoids, cats, dogs, and other animals. Experiments show high final rigging success on screened Omni-Bench inputs, nearly complete face detection recall from the segmentation ensemble and reliable inner-mouth placement with low penetration. Together, OmniFaceRig provides an automatic path from static generated characters to animation-ready facial rigs across both human and non-human topologies.2026-06-06T08:06:01ZChao WangGuangyao MaJohn DoublesteinJunming ChenYiming LinZhaoen SuXiaomin LuoShiyang ChengJie ShenDoug RobleDilin WangYilei LiRakesh Ranjanhttp://arxiv.org/abs/2606.08041v1Wispy to Voluminous: Prior-free Multi-view Capture of Strand-level Facial Hair2026-06-06T08:04:07ZFacial hair is a defining trait of personal identity, yet remains a critical bottleneck for digital avatars. Recent volumetric methods achieve photorealism but bake hair into the underlying face geometry, preventing editability and failing to resolve sparse, strand-like structures. Meanwhile, scalp-hair reconstruction methods target dense hair volumes and do not transfer to the sparse, spatially-varying nature of facial hair. We present a pipeline that automatically reconstructs facial hair -- beard, mustache, lashes, and brows -- from multi-view images, converting an unstructured 3D Gaussian representation into an explicit curve-based strand representation. We resolve geometric ambiguities in four stages: (i) optimizing 3D Gaussians constrained by tracked head geometry to enforce early ray termination and suppress sub-surface noise; (ii) tracing continuous strands robust to frequent crossings and extreme curvature; (iii) grounding strands to the surface and resolving root-tip ambiguity via a physically-motivated prior; and (iv) refining the reconstruction through opacity-driven density control under photometric optimization. To our knowledge, this is the first method to reconstruct high-fidelity facial hair strands from a 3D Gaussian representation. The recovered strands faithfully preserve the orientation and sparsity patterns characteristic of facial hair, and yield assets immediately suitable for downstream production tasks, including facial animation and physical simulation, geometric grooming and transfer, appearance editing, and physics-based rendering.2026-06-06T08:04:07Z27 pages, 16 figures, supplementary includedJaeseong LeeGiljoo NamAdrian JaraboCarlos Aliagahttp://arxiv.org/abs/2606.07932v1LEGS: Laplacian-Enhanced Gaussian Splatting with a Nonlinear Weighted Loss2026-06-06T01:50:14Z3D Gaussian Splatting (3DGS) has become an efficient explicit representation for radiance field reconstruction and real-time novel view synthesis. However, its standard photometric loss treats flat and structure-rich regions similarly, which may limit the recovery of sharp contours and fine details. Edge-Guided Gaussian Splatting (EGGS) improves structure awareness through edge-guided weighting, but mainly relies on first-order gradient responses and linear weighting. In this paper, we propose LEGS, a Laplacian-Enhanced Gaussian Splatting method with a nonlinearly weighted loss. LEGS replaces first-order gradient guidance with second-order Laplacian structural guidance and maps the normalized Laplacian response into pixel-wise weights through nonlinear response-to-weight functions. The proposed loss improves structure-aware Gaussian optimization while keeping the original 3DGS rendering pipeline unchanged. Experiments on the full Tanks\&Temples and Mip-NeRF360 datasets show that LEGS improves peak signal-to-noise ratio (PSNR) by up to 1.68 dB over 3DGS and up to 0.52 dB over EGGS. Incorporating the proposed second-order nonlinear weighting strategy into FastGS and FasterGS further improves PSNR by up to 1.69 dB, demonstrating its effectiveness as a general loss-level extension for Gaussian Splatting pipelines with potential applications in AR/VR, immersive visualization, and real-time 3D content generation.2026-06-06T01:50:14ZYongfei GuoQizhou HuoXuan SunYuanhao Gonghttp://arxiv.org/abs/2503.08434v5Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models2026-06-05T22:11:03ZRecent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics - such as depth-of-field via aperture - current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and inadvertently alters the scene content. In this work, we propose Bokeh Diffusion, a scene-consistent bokeh control framework that explicitly conditions a diffusion model on a physical defocus blur parameter. To overcome the scarcity of paired real-world images captured under different camera settings, we introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations, providing diverse scenes and subjects as well as supervision to learn the separation of image content from lens blur. Central to our framework is our grounded self-attention mechanism, trained on image pairs with different bokeh levels of the same scene, which enables blur strength to be adjusted in both directions while preserving the underlying scene. Extensive experiments demonstrate that our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion, and generalizes effectively across both Stable Diffusion and FLUX architectures.2025-03-11T13:49:12ZSIGGRAPH Asia 2025. Project page: https://atfortes.github.io/projects/bokeh-diffusion/Armando FortesTianyi WeiShangchen ZhouXingang Pan10.1145/3757377.3763906http://arxiv.org/abs/2601.10621v2Phong-Rodrigues Extrinsic Vector-Field Processing2026-06-05T19:54:49ZWe introduce a new extrinsic discretization of tangent vector fields on triangle meshes that is continuous, with bounded derivatives that are continuous almost everywhere, supporting pointwise evaluation and integration of differential operators. We achieve this by building a continuous normal field over the mesh via Phong interpolation and using minimal Rodrigues rotations to transport vertex-based tangent vectors into triangle interiors. Unlike most existing discretizations, which typically sacrifice either continuity or the ability to evaluate derivatives pointwise, our approach supports both. Because it is pointwise evaluatable, and using the fact that the covariant derivative can be decomposed into its symmetric, antisymmetric, and scalar components, our discretization supports the construction of standard vector-field processing operators including the connection and Hodge Laplacians, Killing energy, divergence, curl, and the Lie bracket. This framework provides a simple and practical finite-element formulation for vector-field processing on meshes, supporting both integration-based operators and pointwise queries. To our knowledge, ours is the first discretization that jointly enables extrinsic continuous vector fields, bounded derivatives, and pointwise evaluation of this collection of operators.2026-01-15T17:40:30ZHongyi LiuOded SteinAmir VaxmanMirela Ben-ChenMisha Kazhdanhttp://arxiv.org/abs/2606.07791v1Frequency-Scale Saliency for Spectral Descriptor Analysis in 3D Shape Retrieval2026-06-05T19:09:15ZClassical spectral descriptors such as the Heat Kernel Signature and Wave Kernel Signature are widely used for non-rigid 3D shape retrieval, yet their failure modes remain poorly understood. We present a frequency-scale saliency framework that audits these descriptors by quantifying the retrieval-level contribution of each descriptor scale interval through ablation. We introduce class spectral fingerprints to characterize category-level scale dependence, and show that descriptor similarity between class pairs is substantially correlated with retrieval failure, with a Spearman correlation of 0.479. Experiments on SHREC'11 demonstrate that short scales dominate retrieval performance while long scales are harmful, that HKS and WKS exhibit distinct scale dependence patterns, and that saliency-weighted retrieval improves mAP on hard categories by 0.156, with cross-fold and random-weight controls confirming that the gain is stable and not due to arbitrary reweighting.2026-06-05T19:09:15ZAccepted at Computer Graphics International (CGI) 2026Jianru Shenhttp://arxiv.org/abs/2606.07337v1Skeletal-Anchored Dual Harmonics for Structured 3D Modeling2026-06-05T14:49:51ZWe present Skeletal-Anchored Dual Harmonics (SADH), a novel 3D shape representation that tightly couples local surface geometry with internal meso-skeletal organization. SADH represents a shape as a collection of compact surface patches rooted on internal anchors optimized directly inside the object volume. Each patch is parameterized using a dual-channel spherical harmonic (SH) formulation, where one channel models local radial geometry while the other defines adaptive patch support through a generalized viewing cone. Unlike isotropic primitives such as medial spheres or Gaussian kernels, SH patches directly encode anisotropic local surface geometry together with adaptive spatial support, enabling compact representation of detailed and directionally varying surface regions. Starting from unorganized point clouds, SADH jointly optimizes surface geometry, anchor locations, patch orientations, and structural connectivity through a staged optimization process that progressively forms a coherent meso-skeletal structure. A geodesic anchor graph further preserves structural relationships between neighboring patches. Experiments on complex 3D shapes demonstrate that SADH achieves accurate surface reconstruction together with compact and coherent skeletal organization across a wide range of geometries.2026-06-05T14:49:51Z11 pagesZhentao HuangChanghao LiRuizhen HuHui HuangMinglun Gonghttp://arxiv.org/abs/2503.09630v5CASteer: Cross-Attention Steering for Controllable Concept Erasure2026-06-05T14:06:55ZDiffusion models have transformed image generation, yet controlling their outputs to reliably erase undesired concepts remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering), a training-free framework for concept erasure in diffusion models using steering vectors to influence hidden representations dynamically. CASteer precomputes concept-specific steering vectors by averaging neural activations from images generated for each target concept. During inference, it dynamically applies these vectors to suppress undesired concepts only when they appear, ensuring that unrelated regions remain unaffected. This selective activation enables precise, context-aware erasure without degrading overall image quality. This approach achieves effective removal of harmful or unwanted content across a wide range of visual concepts, all without model retraining. CASteer outperforms state-of-the-art concept erasure techniques while preserving unrelated content and minimizing unintended effects.2025-03-11T18:20:20ZTatiana GaintsevaAndreea-Maria OncescuChengcheng MaZiquan LiuMartin BenningGregory SlabaughJiankang DengIsmail Elezihttp://arxiv.org/abs/2606.07288v1ExMesh: EXplicit Mesh Reconstruction with Topology Adaptation2026-06-05T13:59:58ZReconstructing surface meshes from multi-view images has remained a core challenge in recent years. Most existing methods, whether implicit or explicit, depend on intermediate representations and post-processing steps like Marching Cubes or TSDF fusion, often resulting in artifacts and fragmented geometry. Directly optimizing explicit meshes is a promising approach. However, it presents two critical challenges. The first is how to adaptively refine mesh topology to capture detail without introducing degenerate faces. The second is how to maintain consistent UV coordinates for high-fidelity texturing as the mesh structure evolves. To overcome these, we propose ExMesh, a novel framework that directly optimizes explicit meshes by integrating differentiable optimization with discrete topology updates. Specifically, we introduce an adaptive vertex splitting and merging strategy, along with real-time UV maintenance, to enable coarse-to-fine optimization while preserving geometric integrity. To our knowledge, ExMesh is the first framework to seamlessly integrate discrete topology operations into a continuous differentiable optimization pipeline. Extensive experiments demonstrate that ExMesh achieves a balance among accuracy, computational efficiency, and mesh conciseness.2026-06-05T13:59:58ZAccepted at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026)Chuanjin FanLifan WuWenjie ChangHanzhi ChangWenfei YangTianzhu Zhang