https://arxiv.org/api/wGWKMOSTLpwjtp9ZsdtM+6KboMM2026-06-27T20:56:57Z9390165015http://arxiv.org/abs/2509.03430v1EclipseTouch: Touch Segmentation on Ad Hoc Surfaces using Worn Infrared Shadow Casting2025-09-03T15:59:28ZThe ability to detect touch events on uninstrumented, everyday surfaces has been a long-standing goal for mixed reality systems. Prior work has shown that virtual interfaces bound to physical surfaces offer performance and ergonomic benefits over tapping at interfaces floating in the air. A wide variety of approaches have been previously developed, to which we contribute a new headset-integrated technique called \systemname. We use a combination of a computer-triggered camera and one or more infrared emitters to create structured shadows, from which we can accurately estimate hover distance (mean error of 6.9~mm) and touch contact (98.0\% accuracy). We discuss how our technique works across a range of conditions, including surface material, interaction orientation, and environmental lighting.2025-09-03T15:59:28ZAccepted to UIST 2025Vimal MollynNathan DeVrioChris Harrisonhttp://arxiv.org/abs/2509.04513v1Fidelity-preserving enhancement of ptychography with foundational text-to-image models2025-09-02T21:00:44ZPtychographic phase retrieval enables high-resolution imaging of complex samples but often suffers from artifacts such as grid pathology and multislice crosstalk, which degrade reconstructed images. We propose a plug-and-play (PnP) framework that integrates physics model-based phase retrieval with text-guided image editing using foundational diffusion models. By employing the alternating direction method of multipliers (ADMM), our approach ensures consensus between data fidelity and artifact removal subproblems, maintaining physics consistency while enhancing image quality. Artifact removal is achieved using a text-guided diffusion image editing method (LEDITS++) with a pre-trained foundational diffusion model, allowing users to specify artifacts for removal in natural language. Demonstrations on simulated and experimental datasets show significant improvements in artifact suppression and structural fidelity, validated by metrics such as peak signal-to-noise ratio (PSNR) and diffraction pattern consistency. This work highlights the combination of text-guided generative models and model-based phase retrieval algorithms as a transferable and fidelity-preserving method for high-quality diffraction imaging.2025-09-02T21:00:44ZMing DuVolker RoseJunjing DengDileep SinghSi ChenMathew J. Cherukara10.1364/OPTICA.579456http://arxiv.org/abs/2509.02474v1Unifi3D: A Study on 3D Representations for Generation and Reconstruction in a Common Framework2025-09-02T16:25:12ZFollowing rapid advancements in text and image generation, research has increasingly shifted towards 3D generation. Unlike the well-established pixel-based representation in images, 3D representations remain diverse and fragmented, encompassing a wide variety of approaches such as voxel grids, neural radiance fields, signed distance functions, point clouds, or octrees, each offering distinct advantages and limitations. In this work, we present a unified evaluation framework designed to assess the performance of 3D representations in reconstruction and generation. We compare these representations based on multiple criteria: quality, computational efficiency, and generalization performance. Beyond standard model benchmarking, our experiments aim to derive best practices over all steps involved in the 3D generation pipeline, including preprocessing, mesh reconstruction, compression with autoencoders, and generation. Our findings highlight that reconstruction errors significantly impact overall performance, underscoring the need to evaluate generation and reconstruction jointly. We provide insights that can inform the selection of suitable 3D models for various applications, facilitating the development of more robust and application-specific solutions in 3D generation. The code for our framework is available at https://github.com/isl-org/unifi3d.2025-09-02T16:25:12ZNina WiedemannSainan LiuQuentin LeboutetKatelyn GaoBenjamin UmmenhoferMichael PaulitschKai Yuanhttp://arxiv.org/abs/2509.02278v1Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation2025-09-02T12:59:27ZSinging-driven 3D head animation is a challenging yet promising task with applications in virtual avatars, entertainment, and education. Unlike speech, singing involves richer emotional nuance, dynamic prosody, and lyric-based semantics, requiring the synthesis of fine-grained, temporally coherent facial motion. Existing speech-driven approaches often produce oversimplified, emotionally flat, and semantically inconsistent results, which are insufficient for singing animation. To address this, we propose Think2Sing, a diffusion-based framework that leverages pretrained large language models to generate semantically coherent and temporally consistent 3D head animations, conditioned on both lyrics and acoustics. A key innovation is the introduction of motion subtitles, an auxiliary semantic representation derived through a novel Singing Chain-of-Thought reasoning process combined with acoustic-guided retrieval. These subtitles contain precise timestamps and region-specific motion descriptions, serving as interpretable motion priors. We frame the task as a motion intensity prediction problem, enabling finer control over facial regions and improving the modeling of expressive motion. To support this, we create a multimodal singing dataset with synchronized video, acoustic descriptors, and motion subtitles, enabling diverse and expressive motion learning. Extensive experiments show that Think2Sing outperforms state-of-the-art methods in realism, expressiveness, and emotional fidelity, while also offering flexible, user-controllable animation editing.2025-09-02T12:59:27ZZikai HuangYihan ZhouXuemiao XuCheng XuXiaofen XingJing QinShengfeng Hehttp://arxiv.org/abs/2504.05740v2Micro-splatting: Multistage Isotropy-informed Covariance Regularization Optimization for High-Fidelity 3D Gaussian Splatting2025-09-02T10:05:44ZHigh-fidelity 3D Gaussian Splatting methods excel at capturing fine textures but often overlook model compactness, resulting in massive splat counts, bloated memory, long training, and complex post-processing. We present Micro-Splatting: Two-Stage Adaptive Growth and Refinement, a unified, in-training pipeline that preserves visual detail while drastically reducing model complexity without any post-processing or auxiliary neural modules. In Stage I (Growth), we introduce a trace-based covariance regularization to maintain near-isotropic Gaussians, mitigating low-pass filtering in high-frequency regions and improving spherical-harmonic color fitting. We then apply gradient-guided adaptive densification that subdivides splats only in visually complex regions, leaving smooth areas sparse. In Stage II (Refinement), we prune low-impact splats using a simple opacity-scale importance score and merge redundant neighbors via lightweight spatial and feature thresholds, producing a lean yet detail-rich model. On four object-centric benchmarks, Micro-Splatting reduces splat count and model size by up to 60% and shortens training by 20%, while matching or surpassing state-of-the-art PSNR, SSIM, and LPIPS in real-time rendering. These results demonstrate that Micro-Splatting delivers both compactness and high fidelity in a single, efficient, end-to-end framework.2025-04-08T07:15:58ZThis work has been submitted to journal for potential publicationJee Won LeeHansol LimSooyeun YangJongseong Brad Choihttp://arxiv.org/abs/2509.02141v1GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals2025-09-02T09:43:47Z3D Morphable Models (3DMMs) enable controllable facial geometry and expression editing for reconstruction, animation, and AR/VR, but traditional PCA-based mesh models are limited in resolution, detail, and photorealism. Neural volumetric methods improve realism but remain too slow for interactive use. Recent Gaussian Splatting (3DGS) based facial models achieve fast, high-quality rendering but still depend solely on a mesh-based 3DMM prior for expression control, limiting their ability to capture fine-grained geometry, expressions, and full-head coverage. We introduce GRMM, the first full-head Gaussian 3D morphable model that augments a base 3DMM with residual geometry and appearance components, additive refinements that recover high-frequency details such as wrinkles, fine skin texture, and hairline variations. GRMM provides disentangled control through low-dimensional, interpretable parameters (e.g., identity shape, facial expressions) while separately modelling residuals that capture subject- and expression-specific detail beyond the base model's capacity. Coarse decoders produce vertex-level mesh deformations, fine decoders represent per-Gaussian appearance, and a lightweight CNN refines rasterised images for enhanced realism, all while maintaining 75 FPS real-time rendering. To learn consistent, high-fidelity residuals, we present EXPRESS-50, the first dataset with 60 aligned expressions across 50 identities, enabling robust disentanglement of identity and expression in Gaussian-based 3DMMs. Across monocular 3D face reconstruction, novel-view synthesis, and expression transfer, GRMM surpasses state-of-the-art methods in fidelity and expression accuracy while delivering interactive real-time performance.2025-09-02T09:43:47ZProject page: https://mohitm1994.github.io/GRMM/Mohit MendirattaMayur DeshmukhKartik TeotiaVladislav GolyanikAdam KortylewskiChristian Theobalthttp://arxiv.org/abs/2407.01074v2Multimodal Conditional 3D Face Geometry Generation2025-09-02T07:17:39ZWe present a new method for multimodal conditional 3D face geometry generation that allows user-friendly control over the output identity and expression via a number of different conditioning signals. Within a single model, we demonstrate 3D faces generated from artistic sketches, portrait photos, Canny edges, FLAME face model parameters, 2D face landmarks, or text prompts. Our approach is based on a diffusion process that generates 3D geometry in a 2D parameterized UV domain. Geometry generation passes each conditioning signal through a set of cross-attention layers (IP-Adapter), one set for each user-defined conditioning signal. The result is an easy-to-use 3D face generation tool that produces topology-consistent, high-quality geometry with fine-grain user control.2024-07-01T08:25:59ZAdded more evaluation since the first version. Accepted to SMI 2025. Computers & GraphicsChristopher OttoPrashanth ChandranSebastian WeissMarkus GrossGaspard ZossDerek Bradley10.1016/j.cag.2025.104325http://arxiv.org/abs/2308.05333v3A New Geometric Representation for 3D Bijective Mappings and Applications2025-09-01T17:19:52ZThree-dimensional (3D) mappings are fundamental in various scientific and engineering applications, including computer-aided engineering (CAE), computer graphics, and medical imaging. They are typically represented and stored as three-dimensional coordinates to which each vertex is mapped. With this representation, manipulating 3D mappings while preserving desired properties becomes challenging. In this work, we present a novel geometric representation for 3D bijective mappings, termed 3D quasiconformality (3DQC), which generalizes the concept of Beltrami coefficients from 2D to 3D spaces. This geometric representation facilitates the scientific computation of 3D mapping problems by capturing local geometric properties in 3D mappings. We derive a partial differential equation (PDE) that links the 3DQC to its corresponding mapping. This PDE is discretized into a symmetric positive-definite linear system, which can be efficiently solved using the conjugate gradient method. 3DQC offers a powerful tool for manipulating 3D mappings while maintaining their desired geometric properties. Leveraging 3DQC, we develop numerical algorithms for sparse modeling and numerical interpolation of bijective 3D mappings, facilitating the efficient processing, storage, and manipulation of complex 3D mappings while ensuring bijectivity. Extensive numerical experiments validate the effectiveness and robustness of our proposed methods.2023-08-10T04:41:40ZQiguang ChenLok Ming Luihttp://arxiv.org/abs/2412.12765v2Monocular Facial Appearance Capture in the Wild2025-09-01T14:52:34ZWe present a new method for reconstructing the appearance properties of human faces from a lightweight capture procedure in an unconstrained environment. Our method recovers the surface geometry, diffuse albedo, specular intensity and specular roughness from a monocular video containing a simple head rotation in-the-wild. Notably, we make no simplifying assumptions on the environment lighting, and we explicitly take visibility and occlusions into account. As a result, our method can produce facial appearance maps that approach the fidelity of studio-based multi-view captures, but with a far easier and cheaper procedure.2024-12-17T10:30:56ZYingyan XuKate GadolaPrashanth ChandranSebastian WeissMarkus GrossGaspard ZossDerek Bradleyhttp://arxiv.org/abs/2509.01442v1Quantum Brush: A quantum computing-based tool for digital painting2025-09-01T12:56:57ZWe present Quantum Brush, an open-source digital painting tool that harnesses quantum computing to generate novel artistic expressions. The tool includes four different brushes that translate strokes into unique quantum algorithms, each highlighting a different way in which quantum effects can produce novel aesthetics. Each brush is designed to be compatible with the current noisy intermediate-scale quantum (NISQ) devices, as demonstrated by executing them on IQM's Sirius device.2025-09-01T12:56:57ZJoão S. FerreiraArianna CrippaAstryd ParkDaniel BultriniPierre FromholzRoman LipskiKarl JansenJames R. Woottonhttp://arxiv.org/abs/2509.01134v1RealMat: Realistic Materials with Diffusion and Reinforcement Learning2025-09-01T05:04:51ZGenerative models for high-quality materials are particularly desirable to make 3D content authoring more accessible. However, the majority of material generation methods are trained on synthetic data. Synthetic data provides precise supervision for material maps, which is convenient but also tends to create a significant visual gap with real-world materials. Alternatively, recent work used a small dataset of real flash photographs to guarantee realism, however such data is limited in scale and diversity. To address these limitations, we propose RealMat, a diffusion-based material generator that leverages realistic priors, including a text-to-image model and a dataset of realistic material photos under natural lighting. In RealMat, we first finetune a pretrained Stable Diffusion XL (SDXL) with synthetic material maps arranged in $2 \times 2$ grids. This way, our model inherits some realism of SDXL while learning the data distribution of the synthetic material grids. Still, this creates a realism gap, with some generated materials appearing synthetic. We propose to further finetune our model through reinforcement learning (RL), encouraging the generation of realistic materials. We develop a realism reward function for any material image under natural lighting, by collecting a large-scale dataset of realistic material images. We show that this approach increases generated materials' realism compared to our base model and related work.2025-09-01T05:04:51Z11 pages, 11 figuresXilong ZhouPedro FigueiredoMiloš HašanValentin DeschaintrePaul GuerreroYiwei HuNima Khademi Kalantarihttp://arxiv.org/abs/2509.03542v1The Chaotic Art: Quantum Representation and Manipulation of Color2025-09-01T03:50:10ZDue to its unique computing principles, quantum computing technology will profoundly change the spectacle of color art. Focusing on experimental exploration of color qubit representation, color channel processing, and color image generation via quantum computing, this article proposes a new technical path for color computing in quantum computing environment, by which digital color is represented, operated, and measured in quantum bits, and then restored for classical computers as computing results. This method has been proved practicable as an artistic technique of color qubit representation and quantum computing via programming experiments in Qiskit and IBM Q. By building a bridge between classical chromatics and quantum graphics, quantum computers can be used for information visualization, image processing, and more color computing tasks. Furthermore, quantum computing can be expected to facilitate new color theories and artistic concepts.2025-09-01T03:50:10Z9 pages, 8 figuresGuosheng Huhttp://arxiv.org/abs/2501.10462v2BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation2025-09-01T03:15:49ZWith the widespread use of virtual reality applications, 3D scene generation has become a new challenging research frontier. 3D scenes have highly complex structures and need to ensure that the output is dense, coherent, and contains all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators. However, the generated scenes occupy large amounts of storage space and often lack effective regularisation methods, leading to geometric distortions. To this end, we propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation, which creates diverse and high-quality 3D scenes from text or image inputs. Specifically, a crossmodal progressive scene generation framework is proposed to generate coherent scenes utilizing incremental point cloud reconstruction and 3D Gaussian splatting. Additionally, we propose a hierarchical depth prior-based regularization mechanism that utilizes multi-level constraints on depth accuracy and smoothness to enhance the realism and continuity of the generated scenes. Ultimately, we propose a structured context-guided compression mechanism that exploits structured hash grids to model the context of unorganized anchor attributes, which significantly eliminates structural redundancy and reduces storage overhead. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.2025-01-15T11:33:34ZAccepted by AAAI 2025. Code: https://github.com/SparklingH/BloomSceneXiaolu HouMingcheng LiDingkang YangJiawei ChenZiyun QianXiao ZhaoYue JiangJinjie WeiQingyao XuLihua Zhanghttp://arxiv.org/abs/2509.00777v1IntrinsicReal: Adapting IntrinsicAnything from Synthetic to Real Objects2025-08-31T10:15:31ZEstimating albedo (a.k.a., intrinsic image decomposition) from single RGB images captured in real-world environments (e.g., the MVImgNet dataset) presents a significant challenge due to the absence of paired images and their ground truth albedos. Therefore, while recent methods (e.g., IntrinsicAnything) have achieved breakthroughs by harnessing powerful diffusion priors, they remain predominantly trained on large-scale synthetic datasets (e.g., Objaverse) and applied directly to real-world RGB images, which ignores the large domain gap between synthetic and real-world data and leads to suboptimal generalization performance. In this work, we address this gap by proposing IntrinsicReal, a novel domain adaptation framework that bridges the above-mentioned domain gap for real-world intrinsic image decomposition. Specifically, our IntrinsicReal adapts IntrinsicAnything to the real domain by fine-tuning it using its high-quality output albedos selected by a novel dual pseudo-labeling strategy: i) pseudo-labeling with an absolute confidence threshold on classifier predictions, and ii) pseudo-labeling using the relative preference ranking of classifier predictions for individual input objects. This strategy is inspired by human evaluation, where identifying the highest-quality outputs is straightforward, but absolute scores become less reliable for sub-optimal cases. In these situations, relative comparisons of outputs become more accurate. To implement this, we propose a novel two-phase pipeline that sequentially applies these pseudo-labeling techniques to effectively adapt IntrinsicAnything to the real domain. Experimental results show that our IntrinsicReal significantly outperforms existing methods, achieving state-of-the-art results for albedo estimation on both synthetic and real-world datasets.2025-08-31T10:15:31ZXiaokang WeiZizheng YanZhangyang XiongYiming HaoYipeng QinXiaoguang Hanhttp://arxiv.org/abs/2509.00674v1Triangle Counting in Hypergraph Streams: A Complete and Practical Approach2025-08-31T03:02:34ZTriangle counting in hypergraph streams, including both hyper-vertex and hyper-edge triangles, is a fundamental problem in hypergraph analytics, with broad applications. However, existing methods face two key limitations: (i) an incomplete classification of hyper-vertex triangle structures, typically considering only inner or outer triangles; and (ii) inflexible sampling schemes that predefine the number of sampled hyperedges, which is impractical under strict memory constraints due to highly variable hyperedge sizes. To address these challenges, we first introduce a complete classification of hyper-vertex triangles, including inner, hybrid, and outer triangles. Based on this, we develop HTCount, a reservoir-based algorithm that dynamically adjusts the sample size based on the available memory M. To further improve memory utilization and reduce estimation error, we develop HTCount-P, a partition-based variant that adaptively partitions unused memory into independent sample subsets. We provide theoretical analysis of the unbiasedness and variance bounds of the proposed algorithms. Case studies demonstrate the expressiveness of our triangle structures in revealing meaningful interaction patterns. Extensive experiments on real-world hypergraphs show that both our algorithms achieve highly accurate triangle count estimates under strict memory constraints, with relative errors that are 1 to 2 orders of magnitude lower than those of existing methods and consistently high throughput.2025-08-31T03:02:34ZLingkai MengLong YuanXuemin LinWenjie ZhangYing Zhang