https://arxiv.org/api/n/DNgA2JJ+eBgly27pQ8PWBKCvg2026-06-28T01:59:05Z9390172515http://arxiv.org/abs/2508.13738v1Eliminating Rasterization: Direct Vector Floor Plan Generation with DiffPlanner2025-08-19T11:20:35ZThe boundary-constrained floor plan generation problem aims to generate the topological and geometric properties of a set of rooms within a given boundary. Recently, learning-based methods have made significant progress in generating realistic floor plans. However, these methods involve a workflow of converting vector data into raster images, using image-based generative models, and then converting the results back into vector data. This process is complex and redundant, often resulting in information loss. Raster images, unlike vector data, cannot scale without losing detail and precision. To address these issues, we propose a novel deep learning framework called DiffPlanner for boundary-constrained floor plan generation, which operates entirely in vector space. Our framework is a Transformer-based conditional diffusion model that integrates an alignment mechanism in training, aligning the optimization trajectory of the model with the iterative design processes of designers. This enables our model to handle complex vector data, better fit the distribution of the predicted targets, accomplish the challenging task of floor plan layout design, and achieve user-controllable generation. We conduct quantitative comparisons, qualitative evaluations, ablation experiments, and perceptual studies to evaluate our method. Extensive experiments demonstrate that DiffPlanner surpasses existing state-of-the-art methods in generating floor plans and bubble diagrams in the creative stages, offering more controllability to users and producing higher-quality results that closely match the ground truths.2025-08-19T11:20:35Zaccepted to IEEE Transactions on Visualization and Computer GraphicsShidong WangRenato Pajarola10.1109/TVCG.2025.3559682http://arxiv.org/abs/2508.05064v2A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding2025-08-19T00:47:37ZGaussian Splatting has rapidly emerged as a transformative technique for real-time 3D scene representation, offering a highly efficient and expressive alternative to Neural Radiance Fields (NeRF). Its ability to render complex scenes with high fidelity has enabled progress across domains such as scene reconstruction, robotics, and interactive content creation. More recently, the integration of Large Language Models (LLMs) and language embeddings into Gaussian Splatting pipelines has opened new possibilities for text-conditioned generation, editing, and semantic scene understanding. Despite these advances, a comprehensive overview of this emerging intersection has been lacking. This survey presents a structured review of current research efforts that combine language guidance with 3D Gaussian Splatting, detailing theoretical foundations, integration strategies, and real-world use cases. We highlight key limitations such as computational bottlenecks, generalizability, and the scarcity of semantically annotated 3D Gaussian data and outline open challenges and future directions for advancing language-guided 3D scene understanding using Gaussian Splatting.2025-08-07T06:33:08ZMahmoud Chick ZaoualiTodd CharterYehor KarpichevBrandon HaworthHomayoun Najjaranhttp://arxiv.org/abs/2508.13386v1Sparse, Geometry- and Material-Aware Bases for Multilevel Elastodynamic Simulation2025-08-18T22:11:45ZWe present a multi-level elastodynamics timestep solver for accelerating incremental potential contact (IPC) simulations. Our method retains the robustness of gold standard IPC in the face of intricate geometry, complex heterogeneous material distributions and high resolution input data without sacrificing visual fidelity (per-timestep relative displacement error of $\approx1\%$). The success of our method is enabled by a novel, sparse, geometry- and material-aware basis construction method which allows for the use of fast preconditioned conjugate gradient solvers (in place of a sparse direct solver), but without suffering convergence issues due to stiff or heterogeneous materials. The end result is a solver that produces results visually indistinguishable and quantitatively very close to gold-standard IPC methods but up to $13\times$ faster on identical hardware.2025-08-18T22:11:45Z15 pages,22 figuresTy TrustyDavid I. W. LevinDanny M. Kaufmanhttp://arxiv.org/abs/2505.04813v2WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction2025-08-18T17:00:31ZIn this work we present WIR3D, a technique for abstracting 3D shapes through a sparse set of visually meaningful curves in 3D. We optimize the parameters of Bezier curves such that they faithfully represent both the geometry and salient visual features (e.g. texture) of the shape from arbitrary viewpoints. We leverage the intermediate activations of a pre-trained foundation model (CLIP) to guide our optimization process. We divide our optimization into two phases: one for capturing the coarse geometry of the shape, and the other for representing fine-grained features. Our second phase supervision is spatially guided by a novel localized keypoint loss. This spatial guidance enables user control over abstracted features. We ensure fidelity to the original surface through a neural SDF loss, which allows the curves to be used as intuitive deformation handles. We successfully apply our method for shape abstraction over a broad dataset of shapes with varying complexity, geometric structure, and texture, and demonstrate downstream applications for feature control and shape deformation.2025-05-07T21:28:05ZICCV 2025 Oral Project page: https://threedle.github.io/wir3d/Richard LiuDaniel FuNoah TanItai LangRana Hanockahttp://arxiv.org/abs/2504.17728v3Casual3DHDR: Deblurring High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos2025-08-18T09:37:31ZPhoto-realistic novel view synthesis from multi-view images, such as neural radiance field (NeRF) and 3D Gaussian Splatting (3DGS), has gained significant attention for its superior performance. However, most existing methods rely on low dynamic range (LDR) images, limiting their ability to capture detailed scenes in high-contrast environments. While some prior works address high dynamic range (HDR) scene reconstruction, they typically require multi-view sharp images with varying exposure times captured at fixed camera positions, which is time-consuming and impractical. To make data acquisition more flexible, we propose \textbf{Casual3DHDR}, a robust one-stage method that reconstructs 3D HDR scenes from casually-captured auto-exposure (AE) videos, even under severe motion blur and unknown, varying exposure times. Our approach integrates a continuous-time camera trajectory into a unified physical imaging model, jointly optimizing exposure times, camera trajectory, and the camera response function (CRF). Extensive experiments on synthetic and real-world datasets demonstrate that \textbf{Casual3DHDR} outperforms existing methods in robustness and rendering quality. Our source code and dataset will be available at https://lingzhezhao.github.io/CasualHDRSplat/2025-04-24T16:42:37ZAccepted to ACM Multimedia 2025. Project page: https://lingzhezhao.github.io/CasualHDRSplat/Shucheng GongLingzhe ZhaoWenpu LiHong XieYin ZhangShiyu ZhaoPeidong Liuhttp://arxiv.org/abs/2508.12438v1Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark2025-08-17T17:10:13ZDynamic facial expression generation from natural language is a crucial task in Computer Graphics, with applications in Animation, Virtual Avatars, and Human-Computer Interaction. However, current generative models suffer from datasets that are either speech-driven or limited to coarse emotion labels, lacking the nuanced, expressive descriptions needed for fine-grained control, and were captured using elaborate and expensive equipment. We hence present a new dataset of facial motion sequences featuring nuanced performances and semantic annotation. The data is easily collected using commodity equipment and LLM-generated natural language instructions, in the popular ARKit blendshape format. This provides riggable motion, rich with expressive performances and labels. We accordingly train two baseline models, and evaluate their performance for future benchmarking. Using our Express4D dataset, the trained models can learn meaningful text-to-expression motion generation and capture the many-to-many mapping of the two modalities. The dataset, code, and video examples are available on our webpage: https://jaron1990.github.io/Express4D/2025-08-17T17:10:13ZYaron AloniRotem Shalev-ArkushinYonatan ShafirGuy TevetOhad FriedAmit Haim Bermanohttp://arxiv.org/abs/2508.13228v1PreSem-Surf: RGB-D Surface Reconstruction with Progressive Semantic Modeling and SG-MLP Pre-Rendering Mechanism2025-08-17T17:00:18ZThis paper proposes PreSem-Surf, an optimized method based on the Neural Radiance Field (NeRF) framework, capable of reconstructing high-quality scene surfaces from RGB-D sequences in a short time. The method integrates RGB, depth, and semantic information to improve reconstruction performance. Specifically, a novel SG-MLP sampling structure combined with PR-MLP (Preconditioning Multilayer Perceptron) is introduced for voxel pre-rendering, allowing the model to capture scene-related information earlier and better distinguish noise from local details. Furthermore, progressive semantic modeling is adopted to extract semantic information at increasing levels of precision, reducing training time while enhancing scene understanding. Experiments on seven synthetic scenes with six evaluation metrics show that PreSem-Surf achieves the best performance in C-L1, F-score, and IoU, while maintaining competitive results in NC, Accuracy, and Completeness, demonstrating its effectiveness and practical applicability.2025-08-17T17:00:18Z2025 International Joint Conference on Neural Networks (IJCNN 2025)Yuyan YeHang XuYanghang HuangJiali HuangQian Wenghttp://arxiv.org/abs/2508.12179v1Mesh Processing Non-Meshes via Neural Displacement Fields2025-08-16T23:31:13ZMesh processing pipelines are mature, but adapting them to newer non-mesh surface representations -- which enable fast rendering with compact file size -- requires costly meshing or transmitting bulky meshes, negating their core benefits for streaming applications.
We present a compact neural field that enables common geometry processing tasks across diverse surface representations. Given an input surface, our method learns a neural map from its coarse mesh approximation to the surface. The full representation totals only a few hundred kilobytes, making it ideal for lightweight transmission. Our method enables fast extraction of manifold and Delaunay meshes for intrinsic shape analysis, and compresses scalar fields for efficient delivery of costly precomputed results. Experiments and applications show that our fast, compact, and accurate approach opens up new possibilities for interactive geometry processing.2025-08-16T23:31:13Z14 pagesYuta NomaZhecheng WangChenxi LiuKaran SinghAlec Jacobsonhttp://arxiv.org/abs/2502.00360v2Shape from Semantics: 3D Shape Generation from Multi-View Semantics2025-08-16T04:30:43ZExisting 3D reconstruction methods utilize guidances such as 2D images, 3D point clouds, shape contours and single semantics to recover the 3D surface, which limits the creative exploration of 3D modeling. In this paper, we propose a novel 3D modeling task called ``Shape from Semantics'', which aims to create 3D models whose geometry and appearance are consistent with the given text semantics when viewed from different views. The reconstructed 3D models incorporate more than one semantic elements and are easy for observers to distinguish. We adopt generative models as priors and disentangle the connection between geometry and appearance to solve this challenging problem. Specifically, we propose Local Geometry-Aware Distillation (LGAD), a strategy that employs multi-view normal-depth diffusion priors to complete partial geometries, ensuring realistic shape generation. We also integrate view-adaptive guidance scales to enable smooth semantic transitions across views. For appearance modeling, we adopt physically based rendering to generate high-quality material properties, which are subsequently baked into fabricable meshes. Extensive experimental results demonstrate that our method can generate meshes with well-structured, intricately detailed geometries, coherent textures, and smooth transitions, resulting in visually appealing 3D shape designs. Project page: https://shapefromsemantics.github.io2025-02-01T07:51:59ZProject page: https://shapefromsemantics.github.ioLiangchen LiCaoliwen WangYuqi ZhouBailin DengJuyong Zhanghttp://arxiv.org/abs/2508.07615v2Verification Method for Graph Isomorphism Criteria2025-08-16T00:47:33ZThe criteria for determining graph isomorphism are crucial for solving graph isomorphism problems. The necessary condition is that two isomorphic graphs possess invariants, but their function can only be used to filtrate and subdivide candidate spaces. The sufficient conditions are used to rebuild the isomorphic reconstruction of special graphs, but their drawback is that the isomorphic functions of subgraphs may not form part of the isomorphic functions of the parent graph. The use of sufficient or necessary conditions generally results in backtracking to ensure the correctness of the decision algorithm. The sufficient and necessary conditions can ensure that the determination of graph isomorphism does not require backtracking, but the correctness of its proof process is difficult to guarantee. This article proposes a verification method that can correctly determine whether the judgment conditions proposed by previous researchers are sufficient and necessary conditions. A subdivision method has also been proposed in this article, which can obtain more subdivisions for necessary conditions and effectively reduce the size of backtracking space.2025-08-11T04:45:25Z17 pages, 5 figures, 2 tablesChuanfu HuAimin Houhttp://arxiv.org/abs/2508.11476v1SPG: Style-Prompting Guidance for Style-Specific Content Creation2025-08-15T13:44:56ZAlthough recent text-to-image (T2I) diffusion models excel at aligning generated images with textual prompts, controlling the visual style of the output remains a challenging task. In this work, we propose Style-Prompting Guidance (SPG), a novel sampling strategy for style-specific image generation. SPG constructs a style noise vector and leverages its directional deviation from unconditional noise to guide the diffusion process toward the target style distribution. By integrating SPG with Classifier-Free Guidance (CFG), our method achieves both semantic fidelity and style consistency. SPG is simple, robust, and compatible with controllable frameworks like ControlNet and IPAdapter, making it practical and widely applicable. Extensive experiments demonstrate the effectiveness and generality of our approach compared to state-of-the-art methods. Code is available at https://github.com/Rumbling281441/SPG.2025-08-15T13:44:56ZAccepted to the Journal track of Pacific Graphics 2025Qian LiangZichong ChenYang ZhouHui Huanghttp://arxiv.org/abs/2508.11203v1StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation2025-08-15T04:29:46ZWe introduce StyleMM, a novel framework that can construct a stylized 3D Morphable Model (3DMM) based on user-defined text descriptions specifying a target style. Building upon a pre-trained mesh deformation network and a texture generator for original 3DMM-based realistic human faces, our approach fine-tunes these models using stylized facial images generated via text-guided image-to-image (i2i) translation with a diffusion model, which serve as stylization targets for the rendered mesh. To prevent undesired changes in identity, facial alignment, or expressions during i2i translation, we introduce a stylization method that explicitly preserves the facial attributes of the source image. By maintaining these critical attributes during image stylization, the proposed approach ensures consistent 3D style transfer across the 3DMM parameter space through image-based training. Once trained, StyleMM enables feed-forward generation of stylized face meshes with explicit control over shape, expression, and texture parameters, producing meshes with consistent vertex connectivity and animatability. Quantitative and qualitative evaluations demonstrate that our approach outperforms state-of-the-art methods in terms of identity-level facial diversity and stylization capability. The code and videos are available at [kwanyun.github.io/stylemm_page](kwanyun.github.io/stylemm_page).2025-08-15T04:29:46ZPacific graphics 2025, CGF, 15 pagesSeungmi LeeKwan YunJunyong Nohhttp://arxiv.org/abs/2508.11722v1Substepping the Material Point Method2025-08-15T03:06:52ZMany Material Point Method implementations favor explicit time integration. However large time steps are often desirable for special reasons - for example, for partitioned coupling with another large-step solver, or for imposing constraints, projections, or multiphysics solves. We present a simple, plug-and-play algorithm that advances MPM with a large time step using substeps, effectively wrapping an explicit MPM integrator into a pseudo-implicit one.2025-08-15T03:06:52Z1 pageChenfanfu Jianghttp://arxiv.org/abs/2508.10898v1Puppeteer: Rig and Animate Your 3D Models2025-08-14T17:59:31ZModern interactive applications increasingly demand dynamic 3D content, yet the transformation of static 3D models into animated assets constitutes a significant bottleneck in content creation pipelines. While recent advances in generative AI have revolutionized static 3D model creation, rigging and animation continue to depend heavily on expert intervention. We present Puppeteer, a comprehensive framework that addresses both automatic rigging and animation for diverse 3D objects. Our system first predicts plausible skeletal structures via an auto-regressive transformer that introduces a joint-based tokenization strategy for compact representation and a hierarchical ordering methodology with stochastic perturbation that enhances bidirectional learning capabilities. It then infers skinning weights via an attention-based architecture incorporating topology-aware joint attention that explicitly encodes inter-joint relationships based on skeletal graph distances. Finally, we complement these rigging advances with a differentiable optimization-based animation pipeline that generates stable, high-fidelity animations while being computationally more efficient than existing approaches. Extensive evaluations across multiple benchmarks demonstrate that our method significantly outperforms state-of-the-art techniques in both skeletal prediction accuracy and skinning quality. The system robustly processes diverse 3D content, ranging from professionally designed game assets to AI-generated shapes, producing temporally coherent animations that eliminate the jittering issues common in existing methods.2025-08-14T17:59:31ZProject page: https://chaoyuesong.github.io/Puppeteer/Chaoyue SongXiu LiFan YangZhongcong XuJiacheng WeiFayao LiuJiashi FengGuosheng LinJianfeng Zhanghttp://arxiv.org/abs/2412.00578v3Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives2025-08-14T04:31:15Z3D Gaussian Splatting (3D-GS) is a recent 3D scene reconstruction technique that enables real-time rendering of novel views by modeling scenes as parametric point clouds of differentiable 3D Gaussians. However, its rendering speed and model size still present bottlenecks, especially in resource-constrained settings. In this paper, we identify and address two key inefficiencies in 3D-GS to substantially improve rendering speed. These improvements also yield the ancillary benefits of reduced model size and training time. First, we optimize the rendering pipeline to precisely localize Gaussians in the scene, boosting rendering speed without altering visual fidelity. Second, we introduce a novel pruning technique and integrate it into the training pipeline, significantly reducing model size and training time while further raising rendering speed. Our Speedy-Splat approach combines these techniques to accelerate average rendering speed by a drastic $\mathit{6.71\times}$ across scenes from the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets.2024-11-30T20:25:56ZCVPR 2025, Project Page: https://speedysplat.github.io/Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 21537-21546Alex HansonAllen TuGeng LinVasu SinglaMatthias ZwickerTom Goldstein