https://arxiv.org/api/Hc50nkUK1iqq5KRgfGjPOIt2XaE 2026-06-10T10:48:27Z 10652 195 15 http://arxiv.org/abs/2605.22252v2 LineageFlow: Flow Matching for High-Fidelity Family-Aware Protein Sequence Generation 2026-05-22T03:16:33Z

Protein sequence generation for engineering requires samples that are biophysically plausible and, when targeting a family/domain, remain recognizable members while exploring within-family diversity. Current discrete generative models typically start from uniform or masked-token noise, which discards strong position-specific constraints induced by evolution and forces the model to reconstruct conserved residues from scratch, leading to weak family control and low plausibility. We propose \emph{LineageFlow}, a Dirichlet flow-matching model that initializes generation from lineage priors derived from ancestral sequence reconstruction, turning generation into structured mutation from an evolved scaffold. Across diverse protein families, LineageFlow achieves family validity close to held-out natural sequences and improves predicted structural confidence over uniform-/mask-initialized baselines while maintaining substantial novelty and diversity. Finally, we introduce \emph{rerouting}, a single intermediate-time mutate--select--amplify intervention that enables objective-guided sampling without per-step predictor guidance and yields further gains in plausibility, including a zero-shot enzyme generation case study. Code is available at https://github.com/Jinx-byebye/LineageFlow.

2026-05-21T09:58:08Z Accepted at ICML 2026. 23 pages, 5 figures. Code: https://github.com/Jinx-byebye/LineageFlow Langzhang Liang Ming Yang Yi Feng Junfan Li Shirui Pan Yinghui Xu Tianlei Ying Yizhen Zheng Zenglin Xu http://arxiv.org/abs/2605.22962v1 GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction 2026-05-21T18:47:56Z

Video recordings of child-caregiver interactions enable investigation of attentional dynamics during naturalistic behavior. Such multimodal recording also allows researchers to examine how attention interacts with action and language use in real time. However, manual annotation of such data is time-consuming. Here, we introduce GazeBehavior Annotation Toolkit, a deep-learning-based toolkit designed to facilitate three key processes in data preprocessing and feature extraction: post-hoc synchronization across multiple videos, semi-automatic annotation of gaze target categories, and categorization of participants' poses and hand actions. This toolkit improves the efficiency and scalability of feature extraction from human egocentric eye-tracking and video data. Such improvement is critical in supporting large-scale and longitudinal investigations of attentional dynamics and naturalistic behavior in human early development.

2026-05-21T18:47:56Z submitted to IEEE International Conference on Development and Learning (ICDL), 2026 Iba Baig Kevin Li Yanbin Xu Seiji Cattelain Marie Hallo Hayato Ono Sho Tsuji Ming Bo Cai http://arxiv.org/abs/2605.22540v1 Dynamic Hypergraph Representation Learning for Multivariate Time Series without Prior Knowledge 2026-05-21T14:25:51Z

Hypergraphs have the capacity to capture higher-dimensional relationships among entities across various domains, making them a subject of growing interest within the research community for understanding the structure and dynamics of complex systems. However, a key challenge is the derivation of hypergraph representations from time series data in situations where the structure of the hypergraph is limited or absent. In this study, we propose a model that constructs a dynamic hypergraph representation for multivariate time series without relying on prior knowledge of the data. This is achieved by applying community detection to the time series and transforming the resulting communities, obtained through an attention mechanism, into a hypergraph using a clique-based technique. Hypergraph representations are derived from different time series datasets, and the resulting hypergraphs are then used by a Dynamic Hypergraph Attention Convolution Network (DHACN) for multivariate time series predictions. This research advances the field of hypergraph representation by introducing a novel approach that is better suited to uncover high-order relationships without prior knowledge.

2026-05-21T14:25:51Z Marco Gregnanin Johannes De Smedt Giorgio Gnecco Maurizio Parton http://arxiv.org/abs/2605.22387v1 Hybrid Kolmogorov-Arnold Network and XGBoost Framework for Week-Ahead Price Forecasting in Australia's National Electricity Market 2026-05-21T12:19:58Z

Accurate electricity price forecasting (EPF) is essential for market participants to support operational planning and risk management, yet remains challenging due to strong volatility, nonlinear dynamics, and frequent extreme price spikes. These challenges are particularly pronounced in the Australian National Electricity Market (NEM), where high renewable penetration further increases uncertainty. This paper investigates week-ahead electricity price forecasting and proposes a hybrid KAN+XGBoost framework that integrates Kolmogorov-Arnold Networks (KAN) with tree-based learning. The proposed approach combines the global nonlinear representation capability of KAN with the local robustness of XGBoost to capture both long-term dependencies and short-term price fluctuations. Experiments are conducted on real-world NEM data using an expanding window evaluation strategy. The results demonstrate that the proposed model outperforms benchmark methods, including SARIMAX, Long Short-Term Memory (LSTM), standalone KAN, and XGBoost, reducing MAE by approximately 12% compared to XGBoost and by over 50% compared to a naive baseline. The results suggest that hybrid learning strategies provide an effective and robust solution for electricity price forecasting in highly dynamic electricity markets.

2026-05-21T12:19:58Z The 24th IEEE International Conference on Industrial Informatics, 2026 Houxuan Zhou Sriram Prasad Chenghao Huang Jiajie Feng Hao Wang http://arxiv.org/abs/2605.22215v1 A Generative Adversarial Graph Neural Network for Synthetic Time Series Data 2026-05-21T09:19:21Z

Generating synthetic data for financial time series poses challenges, especially considering their non-stationary nature. Traditional statistical time series models normally assume weak stationarity. However, this assumption can constrain their effectiveness. Deep learning models, particularly Generative Adversarial Networks (GANs), have exhibited considerable potential in emulating complex probability distributions. GANs employ a generator-discriminator framework, where the generator creates data samples, while the discriminator distinguishes real from generated data. In this research, we introduce the Sig-Graph GAN model, which integrates the time-series signature, offering a structured summary of its temporal evolution; the Long Short-Term Memory network, capturing its inherent autoregressive structure; and Graph Neural Networks (GNNs), leveraging geometric patterns within the time-series data. To employ GNNs optimally, we use the visibility graph algorithm to derive a graph-based representation of the underlying time series. Numerical evaluations demonstrate that the Sig-Graph GAN model outperforms baseline methods in replicating the distribution of logarithmic returns across different stock exchanges. The integration of the graph structure with the autoregressive component effectively captures both geometric and temporal patterns embedded in time-series data. This research advances the field of GAN models for time series by introducing a model capable of leveraging both autoregressive properties and geometric structures for synthetic data generation.

2026-05-21T09:19:21Z Marco Gregnanin Johannes De Smedt Giorgio Gnecco Maurizio Parton http://arxiv.org/abs/2605.22111v1 Aerodynamic force reconstruction using physics-informed Gaussian processes 2026-05-21T07:45:19Z

Accurate modeling of aerodynamic loads is essential for understanding and predicting the responses of complex structural systems. However, these models often rely on simplifications of the true physical forces, introducing assumptions that can limit their accuracy. Validating such models becomes particularly challenging in the presence of noisy or incomplete data. To address this, we introduce a probabilistic physics-informed machine learning approach designed to reconstruct the underlying aerodynamic loads from noisy measurements of structural dynamic responses. The model avoids overfitting, eliminates the need for regularization schemes, and allows for the use of heterogeneous and multi-fidelity data during the training process. The efficacy of the approach is demonstrated through the reconstruction of aerodynamic loads on the Great Belt East Bridge, simulated under a linear unsteady assumption. Results show a strong agreement between true and predicted loads, particularly related to root mean squared errors, magnitude, phase angle and peak values of the signals. The method for load reconstructing holds broad applicability, such as modeling validation, future load estimation, and structural damage prognosis.

2026-05-21T07:45:19Z Gledson Rodrigo Tondo Igor Kavrakov Guido Morgenthal 10.1007/978-3-032-15130-8_20 http://arxiv.org/abs/2604.23132v2 UAV Trajectory and Bandwidth Allocation for Efficient Data Collection in Low-Altitude Intelligent IoT: A Hierarchical DRL Approach 2026-05-21T06:45:35Z

The low-altitude Internet of Things (IoT), supported by unmanned aerial vehicles (UAVs), provides ground sensing networks with advanced real-time monitoring and data collection. To maximize data collection volume from distributed IoT nodes, AI-powered data collection technology plays a critical role in enabling intelligent decision-making. Among them, deep reinforcement learning (DRL) has gained particular attention. However, existing DRL-based work on UAV-assisted IoT data collection rarely addresses challenges such as interference and dynamic data volume, while also suffering from high computational demands and slow convergence. To address these challenges, a hierarchical DRL (HDRL) is designed to optimize UAV trajectories and bandwidth allocation to maximize data collection volume. Firstly, the proposed scenario incorporates interference, dynamic data volume of IoT nodes, and multiple types of obstacles. The entire task is hierarchically structured: the upper-level makes flight trajectory decisions at a coarse temporal granularity, while the lower-level makes bandwidth allocation decisions at a finer temporal granularity. Secondly, a trajectory and bandwidth allocation optimization algorithm based on hierarchical deep deterministic policy gradients (TBH-DDPG) is proposed to solve the problem. Finally, simulation results demonstrate that the proposed algorithm improves convergence speed by 44.44%, and reduces computational cost by 58.05%, compared to non-hierarchical algorithm.

2026-04-25T04:09:46Z Zhenjia Xu Xiaoling Zhang Nan Qi Guangxu Zhu Xiaojie Li Luliang Jia http://arxiv.org/abs/2605.22009v1 SDFStent: Real-time interactive virtual stenting via SDF deformation fields 2026-05-21T05:12:07Z

Stenting is among the most common transcatheter interventions for congenital heart disease (CHD). Patient-specific computational fluid dynamics (CFD) simulations can predict hemodynamic outcomes of intervention scenarios but require post-operative vascular geometries that reflect stent-induced shape changes, which existing tools either model inadequately or require extensive time or manual effort to generate. We present SDFStent, a signed distance function (SDF) based mesh deformation method for virtual stenting that operates in real time, maintains mesh integrity, and preserves junction geometry. The stent is modeled as a pipe surface composed of piecewise-capsule SDFs joined by a smooth-minimum operator. Mesh vertices near the expanding SDF surface are displaced along the SDF gradient with a compactly supported fall-off function and an alpha blending mask. SDFStent was benchmarked against three existing approaches and validated on three tetralogy of Fallot (ToF) patients and three coarctation of the aorta (CoA) patients using rigid-wall steady-state CFD simulations against clinical catheterization measurements. Against a prescribed diameter of 6.0 mm, the method produced a mean stented diameter of 5.92 $\pm$ 0.08 mm in 1.5 s, over 100$\times$ faster than the best stenting-specific comparator. All output meshes were watertight and self-intersection-free. CFD-simulated post-operative pressure drops agreed with clinical measurements within 4 mmHg (mean error 2 mmHg). SDFStent produces simulation-ready post-stent models that match prescribed stent dimensions at interactive speeds, from pre-operative anatomy and catheterization data alone. The implementation is open-source and available in 3D Slicer. Its scriptable architecture enables automated generation of large synthetic cohorts for data-driven surrogate modeling.

2026-05-21T05:12:07Z 39 pages, 12 figures, 4 tables. Under review at Computer Methods and Programs in Biomedicine Bohan J. Li Nicholas C. Dorn Andras Lasso Matthew A. Jolley Jeffrey A. Feinstein Doug L. James Alison L. Marsden http://arxiv.org/abs/2605.21707v1 Zero-shot adaptation to order book dynamics 2026-05-20T20:11:37Z

We describe an adaptive market-making architecture that preserves the analytical structure of the Avellaneda--Stoikov framework while introducing a successor measure-style adaptation mechanism. In our paper we keep Avellaneda--Stoikov fast Hamilton--Jacobi--Bellman structure and make it adaptive to changing market regimes and trading objectives. The central idea is to separate market dynamics from the trading objective. The market state determines a low-dimensional set of Avellaneda--Stoikov parameters, while recent realized rewards determine a low-dimensional objective vector. The HJB forward map then converts this objective into optimal bid and ask quotes through a scalarization of future reward features.

2026-05-20T20:11:37Z Arip Asadulaev http://arxiv.org/abs/2606.00071v1 Bitcoin Price Prediction: Peer-Reviewed Evidence and Social Media Discourse 2026-05-20T18:06:33Z

Bitcoin price prediction has attracted hundreds of academic papers and continuous social media debate, yet the field lacks consensus on even basic questions: can any model beat a naive "today's price" baseline at horizons of one to six months? We survey the peer-reviewed landscape, categorize papers by evaluation methodology, and contrast academic findings with informal but substantive discourse on X/Twitter. The picture that emerges is sobering. At short-to-medium horizons, no peer-reviewed study has shown robust superiority over the naive baseline across multiple market regimes. Daily predictability is real but does not extend to hourly or monthly horizons, and may not survive transaction costs. The stock-to-flow model has failed formal out-of-sample testing, and Metcalfe's Law valuations have been challenged as spurious. The Bitcoin price power law, while empirically compelling, has not been subjected to formal distributional tests. Meanwhile, social media practitioners raise valid statistical critiques -- ordinary least squares (OLS) violations, backtest overfitting, spurious regressions -- that the academic literature has not formalized. We identify open research directions and propose concrete methodological standards for future work -- walk-forward evaluation, multi-regime holdout windows, naive baseline comparison, inclusion of zero in hyperparameter grids, and Diebold-Mariano significance testing -- arguing that the field's primary need is not more models but better evaluation.

2026-05-20T18:06:33Z Carlos Baquero http://arxiv.org/abs/2605.21352v1 Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework 2026-05-20T16:16:51Z

The growing use of fast-switching power electronics has made partial discharge (PD) analysis under switching-voltage excitation increasingly important, yet more challenging than under sinusoidal conditions due to activity concentrated at voltage transitions. This work presents an Amplitude-Width-Area (AWA) pattern representation for source-oriented PD analysis under switching-voltage excitation. In the proposed method, time domain PD pulses are characterized using pulse amplitude, width, and area, and mapped into a visual pattern where amplitude and area define the coordinate axes and width is encoded by color. The generated AWA patterns are used to distinguish six single and mixed PD source conditions: corona, internal, surface, corona+internal, corona+surface, and internal+surface. To evaluate the classification capability of the proposed representation, a Random Forest baseline and two Convolutional Neural Network (CNN) models, InceptionV3 and ResNet-18, are compared. The AWA patterns show distinguishable source-dependent distributions, and CNN-based classification achieves testing accuracy above 96%, compared with 73.33% for Random Forest. The results indicate that AWA patterns provide a visual representation of PD pulses suitable for multi-class PD source classification under switching-voltage excitation.

2026-05-20T16:16:51Z Md Rafid Kaysar Shagor Zannatul Ferdousy Mouri Farhina Haque Anindya Bijoy Das http://arxiv.org/abs/2605.21334v1 RSE of a Quantum Transport Code and its Effects 2026-05-20T16:00:36Z

This paper presents our research software engineering (RSE) experiences over two years with libNEGF, a quantum transport code. We describe practical approaches to code quality assurance--including continuous integration, automated testing, and compiler warning correction--and performance engineering through continuous benchmarking. Our systematic application of these practices revealed critical defects: uninitialized memory reads, out-of-bounds writes, and notably, a misunderstood mathematical model in our boundary condition handling. We also document how continuous benchmarking exposed performance regressions caused by HPC system configuration changes. Our findings provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere. While libNEGF is implemented in Fortran, most recommendations are applicable to scientific software regardless of implementation language, and they can be implemented selectively or in their entirety for both new and existing projects.

2026-05-20T16:00:36Z 25 pages Christoph Conrads Edoardo Di Napoli http://arxiv.org/abs/2605.21192v1 The Statistical Significance of the Inclusion of Graph Neural Networks in the Financial Time Series Forecasting Problem 2026-05-20T13:55:54Z

Forecasting univariate time series in the financial market is a challenging endeavor. While numerous statistical and machine learning models have been introduced to address this challenge, they typically concentrate solely on analyzing temporal patterns within the time series data. In this research, we study the statistical significance of the inclusion of geometric patterns in enhancing forecasting accuracy within the context of time series analysis. We introduce the Time-Geometric model, a combination of models designed to exploit both geometric and temporal patterns. The contribution of this research lies in advancing the domain of univariate time series prediction,as demonstrated through extensive empirical evaluations. Our findings underscore that leveraging geometric patterns, captured through Graph Neural Networks, yields statistically significant improvements in forecasting accuracy.

2026-05-20T13:55:54Z Marco Gregnanin Johannes De Smedt Giorgio Gnecco Maurizio Parton http://arxiv.org/abs/2601.02172v3 A stable and accurate X-FFT solver for linear elastic homogenization problems in 3D 2026-05-20T13:55:23Z

Although FFT-based methods are renowned for their numerical efficiency and stability, traditional discretizations fail to capture material interfaces that are not aligned with the grid, resulting in suboptimal accuracy. To address this issue, the work at hand introduces a novel FFT-based solver that achieves interface-conforming accuracy for three-dimensional mechanical problems. More precisely, we integrate the extended finite element (X-FEM) discretization into the FFT-based framework, leveraging its ability to resolve discontinuities via additional shape functions. We employ the modified abs(olute) enrichment and develop a preconditioner based on the concept of strongly stable GFEM, which mitigates the conditioning issues observed in traditional X-FEM implementations. Our computational studies demonstrate that the developed X-FFT solver achieves interface-conforming accuracy, numerical efficiency, and stability when solving three-dimensional linear elastic homogenization problems with smooth material interfaces.

2026-01-05T14:49:18Z 41 pages, 27 figures International Journal for Numerical Methods in Engineering 127, no. 10 (2026): e70342 Flavia Gehrig Matti Schneider 10.1002/nme.70342 http://arxiv.org/abs/2605.21179v1 KSOS-BO: Improving Sampling in Bayesian Optimization via Kernel Sum of Squares 2026-05-20T13:46:50Z

Bayesian Optimization (BO) is an effective framework for globally optimizing functions whose evaluations are expensive. It is particularly effective for optimizing functions defined over continuous domains and explicitly handles stochastic noise in evaluations. As a result, it is widely applied in areas such as hyperparameter tuning, robotics policy search, and scientific experiment design, where sample efficiency is essential. Its two-step procedure consists of model fitting followed by optimization of the acquisition function, which is often treated as a generic black-box problem despite its structured nature. In this work, we introduce KSOS-BO, a kernel-based derivative-free framework for BO acquisition optimization. KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across a diverse set of benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks. In particular, KSOS-BO demonstrates strong performance in highly multimodal and unimodal but ill-conditioned functions, indicating its applicability to diverse landscape structures. Despite a higher per-iteration computational cost, it converges faster in wall-clock time with an average improvement of 93.55% on 10/15 benchmarks, as it reaches high-quality solutions with fewer evaluations. Limitations include reduced effectiveness on functions with steep drops or plate-shaped regions.

2026-05-20T13:46:50Z Buqing Ou Frederike Dümbgen