https://arxiv.org/api/klQIlQugERMe3oHG8pketmqiNCg2026-07-01T08:56:06Z9421210015http://arxiv.org/abs/2506.07657v1PIG: Physically-based Multi-Material Interaction with 3D Gaussians2025-06-09T11:25:21Z3D Gaussian Splatting has achieved remarkable success in reconstructing both static and dynamic 3D scenes. However, in a scene represented by 3D Gaussian primitives, interactions between objects suffer from inaccurate 3D segmentation, imprecise deformation among different materials, and severe rendering artifacts. To address these challenges, we introduce PIG: Physically-Based Multi-Material Interaction with 3D Gaussians, a novel approach that combines 3D object segmentation with the simulation of interacting objects in high precision. Firstly, our method facilitates fast and accurate mapping from 2D pixels to 3D Gaussians, enabling precise 3D object-level segmentation. Secondly, we assign unique physical properties to correspondingly segmented objects within the scene for multi-material coupled interactions. Finally, we have successfully embedded constraint scales into deformation gradients, specifically clamping the scaling and rotation properties of the Gaussian primitives to eliminate artifacts and achieve geometric fidelity and visual consistency. Experimental results demonstrate that our method not only outperforms the state-of-the-art (SOTA) in terms of visual quality, but also opens up new directions and pipelines for the field of physically realistic scene generation.2025-06-09T11:25:21ZZeyu XiaoZhenyi WuMingyang SunQipeng YanYufan GuoZhuoer LiangLihua Zhanghttp://arxiv.org/abs/2506.08064v1A Real-time 3D Desktop Display2025-06-09T10:55:15ZA new extended version of the altiro3D C++ Library -- initially developed to get glass-free holographic displays starting from 2D images -- is here introduced aiming to deal with 3D video streams from either 2D webcam images or flat video files. These streams are processed in real-time to synthesize light-fields (in Native format) and feed realistic 3D experiences. The core function needed to recreate multiviews consists on the use of MiDaS Convolutional Neural Network (CNN), which allows to extract a depth map from a single 2D image. Artificial Intelligence (AI) computing techniques are applied to improve the overall performance of the extended altiro3D Library. Thus, altiro3D can now treat standard images, video streams or screen portions of a Desktop where other apps may be also running (like web browsers, video chats, etc) and render them into 3D. To achieve the latter, a screen region need to be selected in order to feed the output directly into a light-field 3D device such as Looking Glass (LG) Portrait. In order to simplify the acquisition of a Desktop screen area by the user, a multi-platform Graphical User Interface has been also implemented. Sources available at: https://github.com/canessae/altiro3D/releases/tag/2.0.02025-06-09T10:55:15Z10 pages, 5 figuresLivio TenzeEnrique Canessahttp://arxiv.org/abs/2505.21041v3CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians2025-06-09T08:56:18ZAccurate and efficient modeling of large-scale urban scenes is critical for applications such as AR navigation, UAV based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground-based data, reconstructing city-scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suit ability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero order SH Gaussians to generate occlusion-free textures via image-based rendering and back-projection. To capture high-frequency details, we introduce residual Gaussians placed based on proxy-photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance-aware downsampling applied to non-critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real-time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real-world aerial datasets demonstrate that our hybrid representation significantly reduces training time, achieving on average 1.4x speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real-time rendering of large-scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.2025-05-27T11:24:08ZWeihang LiuYuhui ZhongYuke LiXi ChenJiadi CuiHonglong ZhangLan XuXin LouYujiao ShiJingyi YuYingliang Zhanghttp://arxiv.org/abs/2506.07558v1Immersive Visualization of Flat Surfaces Using Ray Marching2025-06-09T08:53:19ZWe present an effective method for visualizing flat surfaces using ray marching. Our approach provides an intuitive way to explore translation surfaces, mirror rooms, unfolded polyhedra, and translation prisms while maintaining computational efficiency. We demonstrate the utility of the method through various examples and provide implementation insights for programmers. Finally, we discuss the use of our visualizations in outreach. We make our simulations and code available online.2025-06-09T08:53:19ZPresented at Bridges Math and Art Conference, Eindhoven 2025. Online demo and code available at https://fabianlander.github.io/apps/raymarchingflatsurfacesapp/ and https://github.com/FabianLander/RayMarchingFlatSurfacesFabian LanderDiaaeldin Tahahttp://arxiv.org/abs/2506.09070v1STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support2025-06-09T07:51:34Z3D Gaussian Splatting (3DGS) has gained popularity for its efficiency and sparse Gaussian-based representation. However, 3DGS struggles to meet the real-time requirement of 90 frames per second (FPS) on resource-constrained mobile devices, achieving only 2 to 9 FPS.Existing accelerators focus on compute efficiency but overlook memory efficiency, leading to redundant DRAM traffic. We introduce STREAMINGGS, a fully streaming 3DGS algorithm-architecture co-design that achieves fine-grained pipelining and reduces DRAM traffic by transforming from a tile-centric rendering to a memory-centric rendering. Results show that our design achieves up to 45.7 $\times$ speedup and 62.9 $\times$ energy savings over mobile Ampere GPUs.2025-06-09T07:51:34ZChenqi ZhangYu FengJieru ZhaoGuangda LiuWenchao DingChentao WuMinyi Guohttp://arxiv.org/abs/2408.07116v3Generative Photomontage2025-06-09T03:36:08ZText-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them harmoniously. We demonstrate that our flexible framework can be used for many applications, including generating new appearance combinations, fixing incorrect shapes and artifacts, and improving prompt alignment. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods and various baselines.2024-08-13T17:59:51ZCVPR 2025. Project webpage: https://lseancs.github.io/generativephotomontage/In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025Sean J. LiuNupur KumariAriel ShamirJun-Yan Zhuhttp://arxiv.org/abs/2408.14732v2OctFusion: Octree-based Diffusion Models for 3D Shape Generation2025-06-08T16:51:04ZDiffusion models have emerged as a popular method for 3D generation. However, it is still challenging for diffusion models to efficiently generate diverse and high-quality 3D shapes. In this paper, we introduce OctFusion, which can generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia 4090 GPU, and the extracted meshes are guaranteed to be continuous and manifold. The key components of OctFusion are the octree-based latent representation and the accompanying diffusion models. The representation combines the benefits of both implicit neural representations and explicit spatial octrees and is learned with an octree-based variational autoencoder. The proposed diffusion model is a unified multi-scale U-Net that enables weights and computation sharing across different octree levels and avoids the complexity of widely used cascaded diffusion schemes. We verify the effectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve state-of-the-art performances on shape generation tasks. We demonstrate that OctFusion is extendable and flexible by generating high-quality color fields for textured mesh generation and high-quality 3D shapes conditioned on text prompts, sketches, or category labels. Our code and pre-trained models are available at https://github.com/octree-nn/octfusion.2024-08-27T01:55:40ZAccepted to Computer Graphics Forum (SGP), 2025Bojun XiongSi-Tong WeiXin-Yang ZhengYan-Pei CaoZhouhui LianPeng-Shuai Wanghttp://arxiv.org/abs/2412.11596v2MeshArt: Generating Articulated Meshes with Structure-Guided Transformers2025-06-08T13:01:34ZArticulated 3D object generation is fundamental for creating realistic, functional, and interactable virtual assets which are not simply static. We introduce MeshArt, a hierarchical transformer-based approach to generate articulated 3D meshes with clean, compact geometry, reminiscent of human-crafted 3D models. We approach articulated mesh generation in a part-by-part fashion across two stages. First, we generate a high-level articulation-aware object structure; then, based on this structural information, we synthesize each part's mesh faces. Key to our approach is modeling both articulation structures and part meshes as sequences of quantized triangle embeddings, leading to a unified hierarchical framework with transformers for autoregressive generation. Object part structures are first generated as their bounding primitives and articulation modes; a second transformer, guided by these articulation structures, then generates each part's mesh triangles. To ensure coherency among generated parts, we introduce structure-guided conditioning that also incorporates local part mesh connectivity. MeshArt shows significant improvements over state of the art, with 57.1% improvement in structure coverage and a 209-point improvement in mesh generation FID.2024-12-16T09:35:08ZProject Page: https://daoyig.github.io/Mesh_Art/, Video: https://www.youtube.com/watch?v=0XaHFbmb_FQDaoyi GaoYawar SiddiquiLei LiAngela Daihttp://arxiv.org/abs/2506.05164v2Towards the target and not beyond: 2D vs 3D visual aids in MR-based neurosurgical simulation2025-06-08T09:18:15ZNeurosurgery increasingly uses Mixed Reality (MR) technologies for intraoperative assistance. The greatest challenge in this area is mentally reconstructing complex 3D anatomical structures from 2D slices with millimetric precision, which is required in procedures like External Ventricular Drain (EVD) placement. MR technologies have shown great potential in improving surgical performance, however, their limited availability in clinical settings underscores the need for training systems that foster skill retention in unaided conditions. In this paper, we introduce NeuroMix, an MR-based simulator for EVD placement. We conduct a study with 48 participants to assess the impact of 2D and 3D visual aids on usability, cognitive load, technology acceptance, and procedure precision and execution time. Three training modalities are compared: one without visual aids, one with 2D aids only, and one combining both 2D and 3D aids. The training phase takes place entirely on digital objects, followed by a freehand EVD placement testing phase performed with a physical catherer and a physical phantom without MR aids. We then compare the participants performance with that of a control group that does not undergo training. Our findings show that participants trained with both 2D and 3D aids achieve a 44\% improvement in precision during unaided testing compared to the control group, substantially higher than the improvement observed in the other groups. All three training modalities receive high usability and technology acceptance ratings, with significant equivalence across groups. The combination of 2D and 3D visual aids does not significantly increase cognitive workload, though it leads to longer operation times during freehand testing compared to the control group.2025-06-05T15:41:04Z15 pages, 7 figures, 3 tables, journalPasquale CascaranoAndrea LorettiMatteo MartinoniLuca ZanuttiniAlessio Di PasqualeGustavo Marfiahttp://arxiv.org/abs/2403.07931v2Formalizing Feint Actions, and Example Studies in Two-Player Games2025-06-07T16:23:45ZFeint actions refer to a set of deceptive actions, which enable players to obtain temporal advantages from their opponents. Such actions are regarded as widely-used tactic in most non-deterministic Two-player Games (e.g. boxing and fencing). However, existing literature does not provide comprehensive and concrete formalization on Feint actions, and their implications on Two-Player Games. We argue that a full exploration on Feint actions is of great importance towards more realistic Two-player Games. In this paper, we provide the first comprehensive and concrete formalization of Feint actions. The key idea of our work is to (1) allow automatic generation of Feint actions, via our proposed Palindrome-directed Generation of Feint actions; and (2) provide concrete principles to properly combine Feint and attack actions. Based on our formalization of Feint actions, we also explore the implications on the game strategy model, and provide optimizations to better incorporate Feint actions. Our experimental results shows that accounting for Feint actions in Non-Deterministic Games (1) brings overall benefits to the game design; and (2) has great benefits on on either game animations or strategy designs, which also introduces a great extent of randomness into randomness-demanded Game models.2024-03-03T18:23:18ZThe pre-print has been combined into another paper (arXiv:2403.07932)Junyu LiuWangkai JinXiangjun Penghttp://arxiv.org/abs/2405.19507v2Data-Efficient Discovery of Hyperelastic TPMS Metamaterials with Extreme Energy Dissipation2025-06-07T03:21:36ZTriply periodic minimal surfaces (TPMS) are a class of metamaterials with a variety of applications and well-known primitive morphologies. We present a new method for discovering novel microscale TPMS structures with exceptional energy-dissipation capabilities, achieving double the energy absorption of the best existing TPMS primitive structure. Our approach employs a parametric representation, allowing seamless interpolation between structures and representing a rich TPMS design space. As simulations are intractable for efficiently optimizing microscale hyperelastic structures, we propose a sample-efficient computational strategy for rapid discovery with limited empirical data from 3D-printed and tested samples that ensures high-fidelity results. We achieve this by leveraging a predictive uncertainty-aware Deep Ensembles model to identify which structures to fabricate and test next. We iteratively refine our model through batch Bayesian optimization, selecting structures for fabrication that maximize exploration of the performance space and exploitation of our energy-dissipation objective. Using our method, we produce the first open-source dataset of hyperelastic microscale TPMS structures, including a set of novel structures that demonstrate extreme energy dissipation capabilities, and show several potential applications of these structures.2024-05-29T20:43:25ZIn SIGGRAPH Conference Papers 2025, August 10--14, 2025, Vancouver, BC, Canada. ACM, New York, NY, USA, 12 pagesMaxine Perroni-ScharfZachary FergusonThomas ButrilleCarlos PortelaMina Konaković Luković10.1145/3721238.3730759http://arxiv.org/abs/2112.06300v5Time of Impact Dataset for Continuous Collision Detection and a Scalable Conservative Algorithm2025-06-06T23:56:03ZWe introduce a large-scale benchmark for broad- and narrow-phase continuous collision detection (CCD) over linearized trajectories with exact time of impacts and use it to evaluate the accuracy, correctness, and efficiency of 13 state-of-the-art CCD algorithms. Our analysis shows that several methods exhibit problems either in efficiency or accuracy.
To overcome these limitations, we introduce an algorithm for CCD designed to be scalable on modern parallel architectures and provably correct when implemented using floating point arithmetic. We integrate our algorithm within the Incremental Potential Contact solver [24] and evaluate its impact on various simulation scenarios. Our approach includes a broad-phase CCD to quickly filter out primitives having disjoint bounding boxes and a narrow-phase CCD that establishes whether the remaining primitive pairs indeed collide. Our broad-phase algorithm is efficient and scalable thanks to the experimental observation that sweeping along a coordinate axis performs surprisingly well on modern parallel architectures. For narrow-phase CCD, we re-design the recently proposed interval-based algorithm of Wang et al. [45] to work on massively parallel hardware.
To foster the adoption and development of future linear CCD algorithms, and to evaluate their correctness, scalability, and overall performance, we release the dataset with analytic ground truth, the implementation of all the algorithms tested, and our testing framework.2021-12-12T18:47:55ZDavid BelgrodBolun WangZachary FergusonXin ZhaoMarco AtteneDaniele PanozzoTeseo Schneiderhttp://arxiv.org/abs/2506.06494v1JGS2: Near Second-order Converging Jacobi/Gauss-Seidel for GPU Elastodynamics2025-06-06T19:42:16ZIn parallel simulation, convergence and parallelism are often seen as inherently conflicting objectives. Improved parallelism typically entails lighter local computation and weaker coupling, which unavoidably slow the global convergence. This paper presents a novel GPU algorithm that achieves convergence rates comparable to fullspace Newton's method while maintaining good parallelizability just like the Jacobi method. Our approach is built on a key insight into the phenomenon of overshoot. Overshoot occurs when a local solver aggressively minimizes its local energy without accounting for the global context, resulting in a local update that undermines global convergence. To address this, we derive a theoretically second-order optimal solution to mitigate overshoot. Furthermore, we adapt this solution into a pre-computable form. Leveraging Cubature sampling, our runtime cost is only marginally higher than the Jacobi method, yet our algorithm converges nearly quadratically as Newton's method. We also introduce a novel full-coordinate formulation for more efficient pre-computation. Our method integrates seamlessly with the incremental potential contact method and achieves second-order convergence for both stiff and soft materials. Experimental results demonstrate that our approach delivers high-quality simulations and outperforms state-of-the-art GPU methods with 50 to 100 times better convergence.2025-06-06T19:42:16ZLei LanZixuan LuChun YuanWeiwei XuHao SuHuamin WangChenfanfu JiangYin Yang10.1145/3731182http://arxiv.org/abs/2506.06483v1Noise Consistency Regularization for Improved Subject-Driven Image Synthesis2025-06-06T19:17:37ZFine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model's robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate that incorporating these losses into fine-tuning not only preserves subject identity but also enhances image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.2025-06-06T19:17:37ZYao NiSong WenPiotr KoniuszAnoop Cherianhttp://arxiv.org/abs/2506.06462v1Splat and Replace: 3D Reconstruction with Repetitive Elements2025-06-06T18:34:22ZWe leverage repetitive elements in 3D scenes to improve novel view synthesis. Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have greatly improved novel view synthesis but renderings of unseen and occluded parts remain low-quality if the training views are not exhaustive enough. Our key observation is that our environment is often full of repetitive elements. We propose to leverage those repetitions to improve the reconstruction of low-quality parts of the scene due to poor coverage and occlusions. We propose a method that segments each repeated instance in a 3DGS reconstruction, registers them together, and allows information to be shared among instances. Our method improves the geometry while also accounting for appearance variations across instances. We demonstrate our method on a variety of synthetic and real scenes with typical repetitive elements, leading to a substantial improvement in the quality of novel view synthesis.2025-06-06T18:34:22ZSIGGRAPH Conference Papers 2025. Project site: https://repo-sam.inria.fr/nerphys/splat-and-replace/Nicolás ViolanteAndreas MeulemanAlban GauthierFrédo DurandThibault GroueixGeorge Drettakis10.1145/3721238.3730727