https://arxiv.org/api/WqKZLxR8h8yEfI//pZZixi8x4uc2026-06-10T13:34:36Z1065224015http://arxiv.org/abs/2605.18092v1Epidemics in a Synthetic Urban Population with Multiple Levels of Mixing2026-05-18T09:09:07ZNetwork--based epidemic models that account for heterogeneous contact patterns are extensively used to predict and control the diffusion of infectious diseases. We use census and survey data to reconstruct a geo--referenced and age--stratified synthetic urban population connected by stable social relations. We consider two kinds of interactions, distinguishing daily (household) contacts from other frequent contacts. Moreover, we allow any couple of individuals to have rare fortuitous interactions. We simulate the epidemic diffusion on a synthetic urban network for a typical medium-size Italian city and characterize the outbreak speed, pervasiveness, and predictability in terms of the socio--demographic and geographic features of the host population. Introducing age--structured contact patterns results in faster and more pervasive outbreaks, while assuming that the interaction frequency decays with distance has only negligible effects. Preliminary evidence shows the existence of patterns of hierarchical spatial diffusion in urban areas, with two regimes for epidemic spread in low- and high-density regions.2026-05-18T09:09:07ZAlessandro CelestiniFrancesca ColaioriStefano GuarinoEnrico MastrostefanoLena Rebecca Zastrow10.1007/978-3-030-93413-2_27http://arxiv.org/abs/2604.09060v2Taming the Black Swan: A Momentum-Gated Hierarchical Optimisation Framework for Asymmetric Alpha Generation2026-05-18T08:52:03ZConventional momentum strategies, despite their proven efficacy in generating alpha, frequently suffer from the "Winner's Curse", a structural vulnerability in which high performing assets exhibit clustered volatility and severe drawdowns during market reversals. To counteract this propensity for momentum crashes, this study presents the Adaptive Equity Generation and Immunisation System (AEGIS), a novel framework that fundamentally reengineers the trade-off between growth and stability. By leveraging a volatility-adjusted momentum filter to identify trend strength and employing a minimax correlation algorithm to enforce structural diversification, the model utilises sequential least squares programming (SLSQP) to optimise capital allocation for the sortino ratio. This architecture allows the portfolio to dynamically adapt to distinct market regimes: explicitly lowering the intensity of crashes during bear markets by decoupling correlated risks, while retaining asymmetric upside participation during bull runs. Empirical validation via a comprehensive 20-year walk-forward backtest (2006-2025), which covers significant stress events like the 2008 Global Financial Crisis, confirms that the framework produces substantial excess alpha relative to the standard S&P 500 benchmark. Notably, the strategy successfully matched the capital appreciation of the high-beta NASDAQ-100 index while achieving significantly reduced downside volatility and improved structural resilience. These results suggest that synthetic beta can be effectively engineered through mathematical regularisation, enabling investors to capture the high-growth characteristics of concentrated portfolios while preserving the defensive stability typically associated with broad-market diversification.2026-04-10T07:39:27Z18 pages, 17 figures, 6 tables, 3 algorithmsArya ChakrabortyRandhir Singhhttp://arxiv.org/abs/2605.17962v1FinDocMRE: A Benchmark for Document-Level Financial Multimodal Reasoning Evaluation2026-05-18T07:18:01ZWhile Large Multimodal Models (LMMs) excel in general visual tasks, their deployment in specialized financial contexts remains insufficient. Existing benchmarks prioritize isolated charts, often overlooking the need to integrate data from text, tables, and images within comprehensive financial documents. To address this limitation, we introduce FINDOCMRE, a multi-image document-level benchmark designed for financial multimodal reasoning. We construct the dataset via a semi-automated pipeline that combines Visual-Centric Generation with Expert Verification, thereby minimizing text bias and ensuring high annotation quality. Spanning twelve domains, the benchmark comprises 12,207 samples derived from 2,878 financial reports, designed to evaluate multi-image processing and document-level understanding across five distinct task types. Extensive experiments with eleven representative LMMs reveal that no model surpasses an overall score of 65, highlighting challenges in integrating visual grounding with logical reasoning within complex document environments. Specifically, we observe a significant performance divergence across tasks, where models exhibit proficiency in semantic narrative construction but struggle with numerical estimation and cross-page visual grounding. FINDOCMRE serves as a rigorous benchmark to guide the evolution of financial LMMs towards expert-level document analysis and reasoning.2026-05-18T07:18:01Z25 pages, 9 figuresJiayong ZhuJiangtong LiJinru DingDawei ChengJie XuFeng Yuhttp://arxiv.org/abs/2605.17744v1Numerical methods for optimal decumulation of a defined contribution pension plan2026-05-18T01:52:15ZThe decumulation of a defined contribution (DC) pension plan is well known to be one of the hardest problems in finance. We model this decumulation challenge as an optimal stochastic control problem. The control problem is solved, at each rebalancing date, by alternatively solving a linear partial-integro differential equation (PIDE) followed by an optimization step. We solve the PIDE by using a $δ$-monotone Fourier method, which ensures that monotonicity holds to $O(δ)$. We allow for the use of leverage (i.e. borrowing to invest in stocks), as well as minimum constraints on bond holdings. We pay particular attention to minimizing wrap-around error, an issue which is endemic for Fourier methods and central to the effective use of these methods for optimal control problems. Rather unexpectedly, we find that restricting the portfolio equity fraction to a maximum of 50\% does not reduce portfolio efficiency noticeably. This may be a useful strategy for risk-averse retirees.2026-05-18T01:52:15ZPeter A. ForsythGeorge Labahnhttp://arxiv.org/abs/2605.17608v1Bayesian-Monte Carlo Schedule Updating for Construction Digital Twins: A Probabilistic Framework for Dynamic Project Forecasting2026-05-17T19:09:37ZConstruction projects frequently experience schedule delays and forecasting uncertainty due to variability in labor productivity, material availability, weather conditions, and project coordination. Conventional deterministic scheduling methods such as the Critical Path Method (CPM) assume fixed activity durations and therefore cannot adequately represent dynamic project uncertainty.
This study presents a Bayesian-Monte Carlo probabilistic schedule updating framework for construction digital twin environments. The proposed methodology integrates stochastic activity-duration modeling, Bayesian recursive updating, Monte Carlo simulation, and uncertainty propagation within a unified computational framework for adaptive schedule forecasting.
Activity durations are modeled using lognormal probability distributions and continuously updated through Bayesian inference as new project observations become available. Monte Carlo simulation is then used to propagate updated uncertainty throughout project networks and generate probabilistic completion-time forecasts, delay-risk estimates, and activity criticality measures.
Simulation experiments using PSPLIB benchmark project networks demonstrate that the proposed framework improves forecasting accuracy and uncertainty representation compared with deterministic CPM and static probabilistic scheduling approaches. The framework further supports adaptive project forecasting through integration of BIM reports, drone observations, IoT telemetry, productivity logs, and site monitoring data.2026-05-17T19:09:37Z22 pages, 3 figures, 5 tablesAtena KhoshkoneshMohsen MohammadaghaVinayak KaushalNavid Ebrahimihttp://arxiv.org/abs/2605.17582v1Scale-Equivariant Generative Forecasting: Weight-Tied Dilated Convolutions, Wavelet Scattering Inputs, and Spectral-Consistency Training for Self-Similar Time Series2026-05-17T18:21:30ZMany natural and engineered time series -- equity returns, climate anomalies, turbulent velocities, neural recordings, packet-level network traffic -- are approximately self-similar: their horizon-$T$ distribution is tied to the horizon-$1$ distribution by one scaling exponent $H$. Standard deep generative sequence models (transformers, dilated TCNs, the WaveNet family) ignore this. Their receptive fields are wide, but kernel parameters live independently at every dilation level, yielding a multi-scale architecture, not a scale-equivariant one. We make three contributions. First, we give a precise definition of discrete scale equivariance for 1D causal networks and prove that dyadic dilation commutes (up to boundary effects) with any dilated-convolution stack whose kernel weights are shared across levels. Tying the kernel shrinks the convolutional parameter budget by an $L$-fold factor (where $L$ is depth) and hard-wires self-similarity in as an inductive bias. Second, we wrap this Scale-Equivariant WaveNet (SE-WaveNet) backbone in three components that carry the same prior: a one-level Daubechies-4 wavelet input, a Hurst-FiLM block exposing the local scaling exponent, and a spectral-consistency training term targeting the $|f|^{-(2H+1)}$ power-law spectrum. The head is a conditional normalising flow, chosen to preserve equivariance. Third, on 30 years of S&P 500 daily log-returns, SE-WaveNet samples reproduce the empirical scaling-collapse diagnostic on the Allan-Variance top-25 universe (median $\mathcal{C}^\star = 0.020$), while a vanilla WaveNet at matched capacity does not ($\geq 0.06$). NLL, KS-calibration, and tail energy distance tie or beat the baseline, with $L\times$ fewer convolutional parameters.2026-05-17T18:21:30ZAndrea Morandihttp://arxiv.org/abs/2505.20650v5FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information2026-05-17T16:54:39ZAccurate interpretation of numerical data in financial reports is critical for markets and regulators. Although XBRL (eXtensible Business Reporting Language) provides a standard for tagging financial figures, mapping thousands of facts to over 10k US GAAP concepts remains costly and error prone. Existing benchmarks oversimplify this task as flat, single step classification over small subsets of concepts, ignoring the hierarchical semantics of the taxonomy and the structured nature of financial documents. Consequently, these benchmarks fail to evaluate Large Language Models (LLMs) under realistic reporting conditions. To bridge this gap, we introduce FinTagging, the first comprehensive benchmark for structure aware and full scope XBRL tagging. We decompose the complex tagging process into two subtasks: (1) FinNI (Financial Numeric Identification), which extracts entities and types from heterogeneous contexts including text and tables; and (2) FinCL (Financial Concept Linking), which maps extracted entities to the full US GAAP taxonomy. This two stage formulation enables a fair assessment of LLMs' capabilities in numerical reasoning and taxonomy alignment. Evaluating diverse LLMs in zero shot settings reveals that while models generalize well in extraction, they struggle significantly with fine grained concept linking, highlighting critical limitations in domain specific structure aware reasoning.2025-05-27T02:55:53ZYan WangLingfei QianXueqing PengYang RenKeyi WangYi HanDongji FengFengran MoShengyuan LinQinchuan ZhangKaiwen HeChenri LuoJianxing ChenJunwei WuChen XuZiyang XuJimin HuangGuojun XiongXiao-Yang LiuQianqian XieJian-Yun Niehttp://arxiv.org/abs/2602.16990v2Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation2026-05-17T16:40:11ZMost recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short-sighted under market volatility and may conflict with a user's long-term goals. Treating what users chose as the sole ground truth, therefore, conflates behavioral imitation with decision quality. We introduce Conv-FinRe, a conversational and longitudinal benchmark for stock recommendation that evaluates LLMs beyond behavior matching. Given an onboarding interview, step-wise market context, and advisory dialogues, models must generate rankings over a fixed investment horizon. Crucially, Conv-FinRe provides multi-view references that distinguish descriptive behavior from normative utility grounded in investor-specific risk preferences, enabling diagnosis of whether an LLM follows rational analysis, mimics user noise, or is driven by market momentum. We build the benchmark from real market data and human decision trajectories, instantiate controlled advisory conversations, and evaluate a suite of state-of-the-art LLMs. Results reveal a persistent tension between rational decision quality and behavioral alignment: models that perform well on utility-based ranking often fail to match user choices, whereas behaviorally aligned models can overfit short-term noise. The dataset is publicly released on Hugging Face, and the codebase is available on GitHub.2026-02-19T01:29:50ZAccepted by SIGIR 2026 Resource Track. Pre-camera-ready versionYan WangYi HanLingfei QianYueru HeXueqing PengDongji FengZhuohan XieVincent Jim ZhangRosie GuoFengran MoJimin HuangYankai ChenXue LiuJian-Yun Niehttp://arxiv.org/abs/2510.08886v3FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs2026-05-17T16:26:33ZGoing beyond simple text processing, financial auditing requires detecting semantic, structural, and numerical inconsistencies across large-scale disclosures. As financial reports are filed in XBRL, a structured XML format governed by accounting standards, auditing becomes a structured information extraction and reasoning problem involving concept alignment, taxonomy-defined relations, and cross-document consistency. Although large language models (LLMs) show promise on isolated financial tasks, their capability in professional-grade auditing remains unclear. We introduce FinAuditing, a taxonomy-aligned, structure-aware benchmark built from real XBRL filings. It contains 1,102 annotated instances averaging over 33k tokens and defines three tasks: Financial Semantic Matching (FinSM), Financial Relationship Extraction (FinRE), and Financial Mathematical Reasoning (FinMR). Evaluations of 13 state-of-the-art LLMs reveal substantial gaps in concept retrieval, taxonomy-aware relation modeling, and consistent cross-document reasoning. These findings highlight the need for realistic, structure-aware benchmarks. We release the evaluation code at https://github.com/The-FinAI/FinAuditing and the dataset at https://huggingface.co/collections/TheFinAI/finauditing. The task currently serves as the official benchmark of an ongoing public evaluation contest at https://open-finance-lab.github.io/SecureFinAI_Contest_2026/.2025-10-10T00:41:55ZAccepted by SIGIR 2026 Resource Track. Pre-camera-ready versionYan WangKeyi WangShanshan YangJaisal PatelJeff ZhaoFengran MoXueqing PengLingfei QianYankai ChenVíctor Gutiérrez-BasultoJimin HuangGuojun XiongXiao-Yang LiuXue LiuJian-Yun Niehttp://arxiv.org/abs/2507.04996v10Agentic Vehicles for Human-Centered Mobility: Definition, Prospects, and Synergistic Co-Development with Vehicle Autonomy2026-05-17T15:38:50ZAutonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as vehicular systems that perceive their environment and execute tasks with minimal human intervention, consistent with the direction indicated by the SAE levels of automated driving. However, recent research and deployments increasingly showcase vehicular capabilities that, while not contradicting autonomy, are not entailed by it, including ambiguous goal handling, purposeful social engagement, external tool use, proactive problem solving, continuous learning, and context-sensitive reasoning in unseen and ethically salient situations, enabled in part by multimodal language models. These developments reveal a gap between technical autonomy and the broader social cognitive functions required for human-centered mobility, which are more precisely captured by the notion of agency. Therefore, rather than adding increasingly elaborate modifiers to "autonomous," we introduce agentic vehicles (AgVs) and suggest that autonomy and agency are intertwined but conceptually distinct: if autonomy concerns what to do and how to do it (task executions under internal rules), agency pertains to why to do it and what else can be done (goal-directed, adaptive actions). We present autonomy and agency as orthogonal yet synergistic dimensions with co-development implications. Vehicle agency marks a novel dimension of mobility service intelligence, heralding vehicles as purposeful actors in society.2025-07-07T13:34:49ZJiangbo YuRaphael FrankLuis Miranda-MorenoSasan JafarnejadJonatas Augusto ManzolliFuqiang LiuJiyao WangAli Eslamihttp://arxiv.org/abs/2605.17424v1A Hybrid Optimization Framework for Spatial Packaging of Interconnected Systems2026-05-17T12:39:05ZThis paper presents an optimization framework for Spatial Packaging of Interconnected Systems with Physical Interactions (SPI2) that addresses the geometric challenges of three-dimensional component placement and routing. While SPI2 generally includes physical interactions, this study isolates the spatial optimization aspect to evaluate placement and routing performance independently. The framework integrates the Maximal Disjoint Ball Decomposition (MDBD) for geometric abstraction with a hybrid optimization strategy that combines stochastic initialization and gradient-based refinement with interior point optimization. It is formulated to handle the nonlinear, non-convex, and continuous characteristics of spatially coupled design problems. The proposed framework is evaluated against a use case from prior SPI2 research and tested with a newly introduced benchmark that enables verifiable assessment of optimization performance. Results indicate that the presented method achieves more than a 10% improvement over existing SPI2 implementations and converges to spatially analytical optima across various benchmark scenarios. Benchmark experiments show solution accuracy of 0.6-2% relative to the ground truth.2026-05-17T12:39:05ZS. WesterhofT. Hofmanhttp://arxiv.org/abs/2605.17387v1Spatial Optimization of Interconnected Systems in Non-Convex Design Spaces2026-05-17T11:11:23ZThis paper presents a spatial optimization methodology that extends the Spatial Packaging of Interconnected Systems with Physical Interaction (SPI2) framework to support arbitrary, non-convex design boundaries. We introduce a smooth, differentiable inside-outside evaluation for components represented using the Maximal Disjoint Ball Decomposition (MDBD) method. The framework also incorporates center-of-gravity and moment-of-inertia calculations directly into the optimization, and provides an end-to-end computer-aided design (CAD) workflow for importing components and reconstructing the optimized assembly. The method is demonstrated on a fictional aircraft auxiliary unit. Results show that the optimizer can place multiple interconnected components within a custom geometry while simultaneously handling routing and physics-based objectives. The approach maintains geometric feasibility within numerical tolerance and illustrates the potential of MDBD-based SPI2 methods for practical engineering design applications.2026-05-17T11:11:23Zn/aS. WesterhofT. Hofmanhttp://arxiv.org/abs/2605.17146v1Weighted Flow Matching and Physics-Informed Nonlinear Filtering for Parameter Estimation in Digital Twins2026-05-16T20:31:29ZDigital twins (DTs) rely on continuous synchronization between physical systems and their virtual counterparts through online parameter estimation under uncertainty. In many practical settings, however, this task is challenged by low observability, weak excitation, nonlinear dynamics, and noisy or biased measurements. In this work, we develop a new mathematical framework that integrates Weighted Flow Matching (WFM) generative modeling with physics-informed nonlinear filtering to enhance parameter estimation in DTs. WFM relies on dynamic reweighting of training samples, which guides the generative model toward parameter regimes most informative of the evolving system state. This generative component is tightly coupled with a physics-informed filtering architecture based on the Unscented Kalman Filter (UKF), yielding a unified DT framework that combines data-driven probability transport with physically consistent state and parameter estimation. The effectiveness of the new integrated framework is demonstrated within a spacecraft DT architecture, where stable moment of inertia estimation is achieved under uncertain and noisy sensing, with significant performance improvements over established approaches such as Extended Kalman Filtering (EKF) and Ensemble Kalman Filtering (EnKF). These results highlight the potential of weighted generative modeling as a core mechanism for real-time DT synchronization in operational and mission-critical systems.2026-05-16T20:31:29Z14 pages, 5 figuresYasar YanikHimadri BasuRicardo G. SanfeliceDaniele Venturihttp://arxiv.org/abs/2605.17039v1Privacy-Preserving Generation Fraud Detection for Distributed Photovoltaic Systems: A Solar Irradiance-Fused Federated Learning Framework2026-05-16T15:19:14ZThe wide adoption of residential photovoltaic (PV) systems introduces new challenges for generation fraud detection (FD). Unlike traditional electricity theft detection, which focuses on electricity consumption-side behavior, PV generation fraud detection (PVG-FD) is complicated by the inherent intermittency and uncertainty of PV generation. The distributed nature of PV systems poses further challenges for centralized PVG-FD approaches due to scalability and privacy concerns. This paper develops a privacy-preserving distributed PVG-FD framework based on federated learning (FL). In this framework, a utility company manages multiple household communities, where each of which is equipped with a local detector. The framework integrates a novel detection model architecture with privacy-preserving global collaboration. Each community's local model fuses PV generation and weather data via a co-attention mechanism to detect discrepancies critical for PVG-FD. The FL framework enables cross-community collaboration by aggregating model parameters and prototypes, leveraging global knowledge sharing with local refinement while preserving privacy. It also uses prototype alignment to address class imbalance by enhancing fraud sample representation. Extensive experiments on a real-world residential PV dataset validate the effectiveness of the developed method and demonstrate that it outperforms state-of-the-art FL methods across various scenarios. The results also show its scalability across varying community sizes and strong robustness to class imbalance.2026-05-16T15:19:14Z15 pagesIEEE Transactions on Smart Grid, 2026Xiaolu ChenChenghao HuangYanru ZhangHao Wang10.1109/TSG.2026.3692585http://arxiv.org/abs/2605.16895v1The Alpha Illusion: Reported Alpha from LLM Trading Agents Should Not Be Treated as Deployment Evidence2026-05-16T09:14:35ZEnd-to-end LLM trading agents have moved quickly from research curiosity to a small ecosystem of named systems, including FinCon, FinMem, TradingAgents, FinAgent, QuantAgent, and FLAG-Trader. Several of these report headline Sharpe ratios that would be material if read at face value on a deployment desk, and associated benchmarks such as FinBen report trading-task Sharpe statistics in the same range. The gap between architecture research and deployment claim has been crossed too freely on both sides of the academia--industry divide. We take a position on that gap: reported alpha from end-to-end LLM trading agents should not be treated as deployment evidence. Before such returns can support claims of deployable trading capability, they must survive structural validity tests for temporal integrity, real-world frictions, counterfactual robustness, predictive calibration, numerical execution, and multi-agent disaggregation. Current public evidence cannot yet distinguish robust predictive ability from temporal contamination, unmodeled frictions, short-window Sharpe uncertainty, narrative fitting, and parametric priors. The problem is not only evaluative but structural. Language confidence is not tradable probability, narrative reasoning is not numerical execution, and model priors may become undisclosed implicit factor exposures. We contribute a minimum reporting protocol suite, P1--P6, with tiered applicability by claim strength, and a conservative modular alternative that uses LLMs as auditable information interfaces upstream of independent calibration, risk, and execution modules. Code and reproduction harness: \url{https://github.com/hj1650782738/Trading}.2026-05-16T09:14:35ZYuxuan YeJun HanAo HuJuncheng BuYiyi ChenLiangjian WenDanilo MandicDanny Dongning SunXu YinghuiZenglin Xu