https://arxiv.org/api/wpjF2gPpYjexvWimd6HBc7hlgMQ2026-06-28T04:30:16Z9390175515http://arxiv.org/abs/2508.08561v1Revisiting the City Tower Project: Geometric Principles and Structural Morphology in the Works of Louis I. Kahn and Anne Tyng2025-08-12T01:56:24ZThis paper presents a study of computation and morphology of Louis Kahn City Tower project. The City Tower is an unbuilt design by Louis I. Kahn and Anne Tyng that integrates form and structure using 3D space triangular geometries. Although never built, the City Tower geometrical framework anticipated later developments in design of space-frame structures. Initially envisioned in the 1950s, the City Tower project is a skyscraper structure based on a tetrahedral and octahedral space frame called Octet-Truss. The aim of this study is to analyze the geometry of the City Tower structure and how it can be used to develop modular and adaptable architectural forms. The study is based on an analytical shape grammar that is used to recreate the original structure, and later to generate new structural configurations based on the City Tower's morphology. This study also investigates the potential applications of these findings in architecture and reveals the possibilities of using tetrahedrons and octahedrons as fundamental geometries for creating scalable and modular designs and presents initial findings.2025-08-12T01:56:24Z8 pages, ARCC ConferenceAysan MokhtarimousaviMichael KleissMostafa AlaniSida Daihttp://arxiv.org/abs/2508.08542v1Hybrid Long and Short Range Flows for Point Cloud Filtering2025-08-12T01:11:22ZPoint cloud capture processes are error-prone and introduce noisy artifacts that necessitate filtering/denoising. Recent filtering methods often suffer from point clustering or noise retaining issues. In this paper, we propose Hybrid Point Cloud Filtering ($\textbf{HybridPF}$) that considers both short-range and long-range filtering trajectories when removing noise. It is well established that short range scores, given by $\nabla_{x}\log p(x_t)$, may provide the necessary displacements to move noisy points to the underlying clean surface. By contrast, long range velocity flows approximate constant displacements directed from a high noise variant patch $x_0$ towards the corresponding clean surface $x_1$. Here, noisy patches $x_t$ are viewed as intermediate states between the high noise variant and the clean patches. Our intuition is that long range information from velocity flow models can guide the short range scores to align more closely with the clean points. In turn, score models generally provide a quicker convergence to the clean surface. Specifically, we devise two parallel modules, the ShortModule and LongModule, each consisting of an Encoder-Decoder pair to respectively account for short-range scores and long-range flows. We find that short-range scores, guided by long-range features, yield filtered point clouds with good point distributions and convergence near the clean surface. We design a joint loss function to simultaneously train the ShortModule and LongModule, in an end-to-end manner. Finally, we identify a key weakness in current displacement based methods, limitations on the decoder architecture, and propose a dynamic graph convolutional decoder to improve the inference process. Comprehensive experiments demonstrate that our HybridPF achieves state-of-the-art results while enabling faster inference speed.2025-08-12T01:11:22ZDasith de Silva EdirimuniXuequan LuAjmal Saeed MianLei WeiGang LiScott SchaeferYing Hehttp://arxiv.org/abs/2508.08467v1Empowering Children to Create AI-Enabled Augmented Reality Experiences2025-08-11T20:57:39ZDespite their potential to enhance children's learning experiences, AI-enabled AR technologies are predominantly used in ways that position children as consumers rather than creators. We introduce Capybara, an AR-based and AI-powered visual programming environment that empowers children to create, customize, and program 3D characters overlaid onto the physical world. Capybara enables children to create virtual characters and accessories using text-to-3D generative AI models, and to animate these characters through auto-rigging and body tracking. In addition, our system employs vision-based AI models to recognize physical objects, allowing children to program interactive behaviors between virtual characters and their physical surroundings. We demonstrate the expressiveness of Capybara through a set of novel AR experiences. We conducted user studies with 20 children in the United States and Argentina. Our findings suggest that Capybara can empower children to harness AI in authoring personalized and engaging AR experiences that seamlessly bridge the virtual and physical worlds.2025-08-11T20:57:39ZAccepted to ACM UIST 2025Lei ZhangShuyao ZhouAmna LiaqatTinney MakBrian BerengardEmily QianAndrés Monroy-Hernándezhttp://arxiv.org/abs/2508.08429v1Improving Facial Rig Semantics for Tracking and Retargeting2025-08-11T19:39:04ZIn this paper, we consider retargeting a tracked facial performance to either another person or to a virtual character in a game or virtual reality (VR) environment. We remove the difficulties associated with identifying and retargeting the semantics of one rig framework to another by utilizing the same framework (3DMM, FLAME, MetaHuman, etc.) for both subjects. Although this does not constrain the choice of framework when retargeting from one person to another, it does force the tracker to use the game/VR character rig when retargeting to a game/VR character. We utilize volumetric morphing in order to fit facial rigs to both performers and targets; in addition, a carefully chosen set of Simon-Says expressions is used to calibrate each rig to the motion signatures of the relevant performer or target. Although a uniform set of Simon-Says expressions can likely be used for all person to person retargeting, we argue that person to game/VR character retargeting benefits from Simon-Says expressions that capture the distinct motion signature of the game/VR character rig. The Simon-Says calibrated rigs tend to produce the desired expressions when exercising animation controls (as expected). Unfortunately, these well-calibrated rigs still lead to undesirable controls when tracking a performance (a well-behaved function can have an arbitrarily ill-conditioned inverse), even though they typically produce acceptable geometry reconstructions. Thus, we propose a fine-tuning approach that modifies the rig used by the tracker in order to promote the output of more semantically meaningful animation controls, facilitating high efficacy retargeting. In order to better address real-world scenarios, the fine-tuning relies on implicit differentiation so that the tracker can be treated as a (potentially non-differentiable) black box.2025-08-11T19:39:04ZDalton OmensAllise ThurmanJihun YuRonald Fedkiwhttp://arxiv.org/abs/2508.08384v1Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors2025-08-11T18:11:42ZIndoor lighting estimation from a single image or video remains a challenge due to its highly ill-posed nature, especially when the lighting condition of the scene varies spatially and temporally. We propose a method that estimates from an input video a continuous light field describing the spatiotemporally varying lighting of the scene. We leverage 2D diffusion priors for optimizing such light field represented as a MLP. To enable zero-shot generalization to in-the-wild scenes, we fine-tune a pre-trained image diffusion model to predict lighting at multiple locations by jointly inpainting multiple chrome balls as light probes. We evaluate our method on indoor lighting estimation from a single image or video and show superior performance over compared baselines. Most importantly, we highlight results on spatiotemporally consistent lighting estimation from in-the-wild videos, which is rarely demonstrated in previous works.2025-08-11T18:11:42Z11 pages. Accepted by SIGGRAPH 2025 as Conference PaperSIGGRAPH '25: ACM SIGGRAPH 2025 Conference Conference Papers, Article 107, pages1-11, July 2025Mutian TongRundi WuChangxi Zheng10.1145/3721238.3730749http://arxiv.org/abs/2508.08228v1LL3M: Large Language 3D Modelers2025-08-11T17:48:02ZWe present LL3M, a multi-agent system that leverages pretrained large language models (LLMs) to generate 3D assets by writing interpretable Python code in Blender. We break away from the typical generative approach that learns from a collection of 3D data. Instead, we reformulate shape generation as a code-writing task, enabling greater modularity, editability, and integration with artist workflows. Given a text prompt, LL3M coordinates a team of specialized LLM agents to plan, retrieve, write, debug, and refine Blender scripts that generate and edit geometry and appearance. The generated code works as a high-level, interpretable, human-readable, well-documented representation of scenes and objects, making full use of sophisticated Blender constructs (e.g. B-meshes, geometry modifiers, shader nodes) for diverse, unconstrained shapes, materials, and scenes. This code presents many avenues for further agent and human editing and experimentation via code tweaks or procedural parameters. This medium naturally enables a co-creative loop in our system: agents can automatically self-critique using code and visuals, while iterative user instructions provide an intuitive way to refine assets. A shared code context across agents enables awareness of previous attempts, and a retrieval-augmented generation knowledge base built from Blender API documentation, BlenderRAG, equips agents with examples, types, and functions empowering advanced modeling operations and code correctness. We demonstrate the effectiveness of LL3M across diverse shape categories, style and material edits, and user-driven refinements. Our experiments showcase the power of code as a generative and interpretable medium for 3D asset creation. Our project page is at https://threedle.github.io/ll3m.2025-08-11T17:48:02ZOur project page is at https://threedle.github.io/ll3mSining LuGuan ChenNam Anh DinhItai LangAri HoltzmanRana Hanockahttp://arxiv.org/abs/2508.08086v1Matrix-3D: Omnidirectional Explorable 3D World Generation2025-08-11T15:29:57ZExplorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from a limited scope in the generated scenes. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction. To facilitate effective training, we also introduce the Matrix-Pano dataset, the first large-scale synthetic collection comprising 116K high-quality static panoramic video sequences with depth and trajectory annotations. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance in panoramic video generation and 3D world generation. See more in https://matrix-3d.github.io.2025-08-11T15:29:57ZTechnical ReportZhongqi YangWenhang GeYuqi LiJiaqi ChenHaoyuan LiMengyin AnFei KangHua XueBaixin XuYuyang YinEric LiYang LiuYikai WangHao-Xiang GuoYahui Zhouhttp://arxiv.org/abs/2504.12684v3SOPHY: Learning to Generate Simulation-Ready Objects with Physical Materials2025-08-11T14:43:49ZWe present SOPHY, a generative model for 3D physics-aware shape synthesis. Unlike existing 3D generative models that focus solely on static geometry or 4D models that produce physics-agnostic animations, our method jointly synthesizes shape, texture, and material properties related to physics-grounded dynamics, making the generated objects ready for simulations and interactive, dynamic environments. To train our model, we introduce a dataset of 3D objects annotated with detailed physical material attributes, along with an efficient pipeline for material annotation. Our method enables applications such as text-driven generation of interactive, physics-aware 3D objects and single-image reconstruction of physically plausible shapes. Furthermore, our experiments show that jointly modeling shape and material properties enhances the realism and fidelity of the generated shapes, improving performance on both generative geometry and physical plausibility.2025-04-17T06:17:24ZProject page: https://xjay18.github.io/SOPHY_pageJunyi CaoEvangelos Kalogerakishttp://arxiv.org/abs/2501.18152v2StructuredField: Unifying Structured Geometry and Radiance Field2025-08-11T11:31:06ZRecent point-based differentiable rendering techniques have achieved significant success in high-fidelity reconstruction and fast rendering. However, due to the unstructured nature of point-based representations, they are difficult to apply to modern graphics pipelines designed for structured meshes, as well as to a variety of simulation and editing algorithms that work well with structured mesh representations. To this end, we propose StructuredField, a novel representation that achieves both a structured geometric representation of the reconstructed object and high-fidelity rendering reconstruction. We employ structured tetrahedral meshes to represent the reconstructed object. We reparameterize the geometric attributes of these tetrahedra into the parameters of 3D Gaussian primitives, thereby enabling differentiable, high-fidelity rendering directly from the mesh. Furthermore, a hierarchical implicit subdivision strategy is utilized to ensure a conformal mesh structure while empowering the representation to capture multi-scale details. To maintain geometric integrity during optimization, we propose a novel inversion-free homeomorphism that constrains the tetrahedral mesh, guaranteeing it remains both inversion-free and self-intersection-free during the optimization process and in the final result. Based on our proposed StructuredField, we achieve high-quality structured meshes that are completely inversion-free and conformal, while also attaining reconstruction results comparable to those of 3DGS. We also demonstrate the applicability of our representation to various applications such as physical simulation, deformation, and level-of-detail.2025-01-30T05:37:45ZProject page: https://structuredfield.github.ioKaiwen SongJinkai CuiZherui QiuJuyong Zhanghttp://arxiv.org/abs/2508.07760v1Sea-Undistort: A Dataset for Through-Water Image Restoration in High Resolution Airborne Bathymetric Mapping2025-08-11T08:43:29ZAccurate image-based bathymetric mapping in shallow waters remains challenging due to the complex optical distortions such as wave induced patterns, scattering and sunglint, introduced by the dynamic water surface, the water column properties, and solar illumination. In this work, we introduce Sea-Undistort, a comprehensive synthetic dataset of 1200 paired 512x512 through-water scenes rendered in Blender. Each pair comprises a distortion-free and a distorted view, featuring realistic water effects such as sun glint, waves, and scattering over diverse seabeds. Accompanied by per-image metadata such as camera parameters, sun position, and average depth, Sea-Undistort enables supervised training that is otherwise infeasible in real environments. We use Sea-Undistort to benchmark two state-of-the-art image restoration methods alongside an enhanced lightweight diffusion-based framework with an early-fusion sun-glint mask. When applied to real aerial data, the enhanced diffusion model delivers more complete Digital Surface Models (DSMs) of the seabed, especially in deeper areas, reduces bathymetric errors, suppresses glint and scattering, and crisply restores fine seabed details. Dataset, weights, and code are publicly available at https://www.magicbathy.eu/Sea-Undistort.html.2025-08-11T08:43:29ZUnder review in IEEE Geoscience and Remote Sensing LettersMaximilian KromerPanagiotis AgrafiotisBegüm Demir10.1109/LGRS.2025.3601239http://arxiv.org/abs/2508.07726v1Symplectification of Circular Arcs and Arc Splines2025-08-11T07:57:15ZIn this article, circular arcs are considered both individually and as elements of a piecewise circular curve. The endpoint parameterization proves to be quite advantageous here. The perspective of symplectic geometry provides new vectorial relationships for the circular arc. Curves are considered whose neighboring circular elements each have a common end point or, in addition, a common tangent. These arc splines prove to be a one-parameter curve family, whereby this parameter can be optimized with regard to various criteria.2025-08-11T07:57:15Z14 pages, 10 figures, 1 program listingStefan Gössnerhttp://arxiv.org/abs/2503.24009v2Learning 3D-Gaussian Simulators from RGB Videos2025-08-10T15:15:08ZRealistic simulation is critical for applications ranging from robotics to animation. Learned simulators have emerged as a possibility to capture real world physics directly from video data, but very often require privileged information such as depth information, particle tracks and hand-engineered features to maintain spatial and temporal consistency. These strong inductive biases or ground truth 3D information help in domains where data is sparse but limit scalability and generalization in data rich regimes. To overcome the key limitations, we propose 3DGSim, a learned 3D simulator that directly learns physical interactions from multi-view RGB videos. 3DGSim unifies 3D scene reconstruction, particle dynamics prediction and video synthesis into an end-to-end trained framework. It adopts MVSplat to learn a latent particle-based representation of 3D scenes, a Point Transformer for particle dynamics, a Temporal Merging module for consistent temporal aggregation and Gaussian Splatting to produce novel view renderings. By jointly training inverse rendering and dynamics forecasting, 3DGSim embeds the physical properties into point-wise latent features. This enables the model to capture diverse physical behaviors, from rigid to elastic, cloth-like dynamics, and boundary conditions (e.g. fixed cloth corner), along with realistic lighting effects that also generalize to unseen multibody interactions and novel scene edits.2025-03-31T12:33:59ZMikel ZhobroAndreas René GeistGeorg Martiushttp://arxiv.org/abs/2501.15981v2MatCLIP: Light- and Shape-Insensitive Assignment of PBR Material Models2025-08-09T06:47:56ZAssigning realistic materials to 3D models remains a significant challenge in computer graphics. We propose MatCLIP, a novel method that extracts shape- and lighting-insensitive descriptors of Physically Based Rendering (PBR) materials to assign plausible textures to 3D objects based on images, such as the output of Latent Diffusion Models (LDMs) or photographs. Matching PBR materials to static images is challenging because the PBR representation captures the dynamic appearance of materials under varying viewing angles, shapes, and lighting conditions. By extending an Alpha-CLIP-based model on material renderings across diverse shapes and lighting, and encoding multiple viewing conditions for PBR materials, our approach generates descriptors that bridge the domains of PBR representations with photographs or renderings, including LDM outputs. This enables consistent material assignments without requiring explicit knowledge of material relationships between different parts of an object. MatCLIP achieves a top-1 classification accuracy of 76.6%, outperforming state-of-the-art methods such as PhotoShape and MatAtlas by over 15 percentage points on publicly available datasets. Our method can be used to construct material assignments for 3D shape datasets such as ShapeNet, 3DCoMPaT++, and Objaverse. All code and data will be released.2025-01-27T12:08:52ZAccepted at SIGGRAPH 2025 (Conference Track). Project page: https://birsakm.github.io/matclipSIGGRAPH 2025 Conference ProceedingsMichael BirsakJohn FemianiBiao ZhangPeter Wonka10.1145/3721238.3730740http://arxiv.org/abs/2508.06786v1Quantifying Visualization Vibes: Measuring Socio-Indexicality at Scale2025-08-09T02:27:56ZWhat impressions might readers form with visualizations that go beyond the data they encode? In this paper, we build on recent work that demonstrates the socio-indexical function of visualization, showing that visualizations communicate more than the data they explicitly encode. Bridging this with prior work examining public discourse about visualizations, we contribute an analytic framework for describing inferences about an artifact's social provenance. Via a series of attribution-elicitation surveys, we offer descriptive evidence that these social inferences: (1) can be studied asynchronously, (2) are not unique to a particular sociocultural group or a function of limited data literacy, and (3) may influence assessments of trust. Further, we demonstrate (4) how design features act in concert with the topic and underlying messages of an artifact's data to give rise to such 'beyond-data' readings. We conclude by discussing the design and research implications of inferences about social provenance, and why we believe broadening the scope of research on human factors in visualization to include sociocultural phenomena can yield actionable design recommendations to address urgent challenges in public data communication.2025-08-09T02:27:56ZAmy Rae FoxMichelle MorgensternGraham M. JonesArvind Satyanarayanhttp://arxiv.org/abs/2508.06775v1Visualization Vibes: The Socio-Indexical Function of Visualization Design2025-08-09T02:03:13ZIn contemporary information ecologies saturated with misinformation, disinformation, and a distrust of science itself, public data communication faces significant hurdles. Although visualization research has broadened criteria for effective design, governing paradigms privilege the accurate and efficient transmission of data. Drawing on theory from linguistic anthropology, we argue that such approaches-focused on encoding and decoding propositional content-cannot fully account for how people engage with visualizations and why particular visualizations might invite adversarial or receptive responses. In this paper, we present evidence that data visualizations communicate not only semantic, propositional meaning$\unicode{x2013}$meaning about data$\unicode{x2013}$but also social, indexical meaning$\unicode{x2013}$meaning beyond data. From a series of ethnographically-informed interviews, we document how readers make rich and varied assessments of a visualization's "vibes"$\unicode{x2013}$inferences about the social provenance of a visualization based on its design features. Furthermore, these social attributions have the power to influence reception, as readers' decisions about how to engage with a visualization concern not only content, or even aesthetic appeal, but also their sense of alignment or disalignment with the entities they imagine to be involved in its production and circulation. We argue these inferences hinge on a function of human sign systems that has thus far been little studied in data visualization: socio-indexicality, whereby the formal features (rather than the content) of communication evoke social contexts, identities, and characteristics. Demonstrating the presence and significance of this socio-indexical function in visualization, this paper offers both a conceptual foundation and practical intervention for troubleshooting breakdowns in public data communication.2025-08-09T02:03:13ZMichelle MorgensternAmy Rae FoxGraham M. JonesArvind Satyanarayan