https://arxiv.org/api/VUEYeLoMZjJ+JiUzqEvYhpOtuzQ2026-06-21T11:32:53Z132587515http://arxiv.org/abs/2606.11264v1OmniBioTwin: A System-of-Twinned-Systems Framework for Health Digital Twins2026-06-09T03:54:49ZHealth digital twins (HDTs) promise patient-specific modeling and decision support but current approaches remain structurally fragmented: monolithic models that address a single organ or task lack cross-scale fidelity, while system-level twins lack generalizable architectural frameworks. We propose OmniBioTwin, a System-of-Twinned-Systems (SoTS) framework that organizes HDTs as modular computational entities coupled through explicit interaction operators within a multi-layer network architecture. The framework comprises seven coordinated layers - spanning data integration, autonomous twin modeling, cross-scale coupling, temporal synchronization, and human-in-the-loop decision support. We demonstrate OmniBioTwin by instantiating a multiscale twin for glucagon-like peptide-1 (GLP-1) signaling pathways in Alzheimer's disease, illustrating how molecular, cellular, and organ-level twins can be composed and coupled within a unified system.2026-06-09T03:54:49ZZhaohui WangYu HuangJiang Bianhttp://arxiv.org/abs/2606.10222v1Multifractal Signatures of Ageing and Dementia Development: A Multifractal Space-Filling Curve Analysis2026-06-08T22:22:46ZMultifractality is an effective formalism for quantifying the nonlinear, scale-free properties of complex data. In this study, we propose a novel and efficient methodology, termed Multifractal Space-filling Curve Analysis (MFSCA), for quantifying the correlation structure of multidimensional data. Within this framework, the original multidimensional data - while preserving both local and long-range organisational properties - are projected onto a one-dimensional representation using a fractal space-filling curve. The resulting one-dimensional signal is then analysed using multifractal algorithms. We demonstrate the utility of the method using both artificially generated multifractal structures and real data. In particular, we apply MFSCA to analyse magnetic resonance imaging (MRI) data from Alzheimer patients at different stages of dementia. Based on the results, we estimate the multifractal profiles of the brain for healthy subjects of different ages as well as for dementia patients. The analysis reveals that the spatial organization of brain structures, as measured by the degree of multifractality, progressively weakens with age and the development of dementia. A transition from multifractality to monofractality is observed both in control groups, when comparing the Young Control and Elderly Control groups, and among dementia subjects of similar age but at different stages of the disease, namely early dementia and mild cognitive impairment. Thus, from the perspective of multiscaling properties, the heterogeneous characteristics of spatial brain organization deteriorate under worsening conditions, leading to a homogeneous and weakly correlated structure. These findings not only effectively capture key aspects of brain organisation, but also demonstrate that the multifractality of MRI data can serve as a marker of structural brain changes.2026-06-08T22:22:46ZMarta LotkaJacek GrelaZbigniew DrogoszJeremi K. OchabPaweł Oświęcimkahttp://arxiv.org/abs/2602.04119v2Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors2026-06-08T20:03:30ZThe application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.2026-02-04T01:27:42ZHyeonah KimMinsu KimCeline RogetDionessa BitonLouis VaillancourtYves V. BrunYoshua BengioAlex Hernandez-Garciahttp://arxiv.org/abs/2606.10107v1Maximum Matching Accuracy: An Instance Segmentation Evaluation Metric Utilizing Globally Optimal Matching2026-06-08T19:36:28ZReliable evaluation of instance segmentation models requires metrics that accurately and consistently reflect segmentation quality. However, the metrics most widely used in biological imaging carry fundamental mathematical weaknesses: hard Intersection-over-Union (IoU) thresholds that produce discontinuous, low sensitivity scoring; per-object normalization that distorts scores under object size variation; and greedy or one-to-many matching procedures that yield non-optimal, order-dependent correspondences. Together, these properties produce unintuitive and unreliable model rankings under common failure modes such as split cells, merged cells, and cell boundary imprecision. We propose Maximum Matching Accuracy (MMA), a threshold-free continuous metric that finds a globally optimal one-to-one matching between predicted and ground truth objects and aggregates total overlap using per-pixel normalization. We evaluate MMA against AP@50, PQ, SEG, and AJI across three experiments: synthetic failure cases, progressive corruption tests, and a model ranking comparison. MMA produces scores that are more stable, more sensitive, and more interpretable than existing alternatives, providing a principled foundation for fair instance segmentation benchmarking in biological cell imaging.2026-06-08T19:36:28ZKaden StillwagonAlexandra D. VandeLooCraig R. Foresthttp://arxiv.org/abs/2606.10080v1VFUSE: Virulent Feature Understanding with Sparse autoEncoders2026-06-08T18:54:31ZGenerative models have shown remarkable progress in a variety of domains such as protein design, but such power enables the opaque generation of hazardous proteins. In this work, we introduce VFUSE (Virulent Feature Understanding with Sparse autoEncoders), a mechanistic interpretability approach that trains SAEs on diffusion-transformer activations to audit protein models for hazard-aware features. We apply VFUSE to RoseTTAFold3 and RFDiffusion3, popular open-weight models for protein folding and synthesis. We find that for certain blocks, linear probes detect hazardous designs significantly better when fit in the SAE latent space over the original model's representations: improving interpretability without sacrificing model performance. Furthermore, we identify monosemantic features from the SAE that fire only on hazardous designs at up to AUROC $0.84$ ($q < 10^{-13}$). To our knowledge this is the first SAE trained on an all-atom diffusion model and the first feature-level virulence audit of a protein design model, paving the way towards safe and interpretable protein design.2026-06-08T18:54:31ZMichael YuMatthew L. Olsonhttp://arxiv.org/abs/2606.09672v1Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery2026-06-08T15:54:28ZAsk a pretrained biomedical language model whether "cortisol 28 ug/dL" and "stock-market volatility" are related, and it returns a cosine similarity of 0.83 on a scale where 1.0 means identical. The two share no mechanism. This is not a corner case: every off-the-shelf biomedical encoder we tested (BioBERT, PubMedBERT, BioM-ELECTRA) scores unrelated cross-domain pairs between 0.76 and 0.92 when the answer should be near zero. Accuracy on cross-domain discrimination is 0%.
Retrieval systems survive this, because a language model downstream filters the noise. A Large Behavioural Model (LBM), a foundation model whose subject is a person rather than a sentence, does not: it reasons over a graph of a user's life and treats embedding proximity as evidence that two events are causally linked. False proximity writes a false causal edge, and everything downstream inherits the error. Here, embedding geometry is not a tuning knob; it is correctness.
We report the fix. A contrastive pass over 72,034 pairs raises PubMedBERT BIOSSES correlation from 0.633 to 0.828 and within-vs-across-domain separation from 1.05x to 1.63x. A second pass, BODHI, mines hard negatives from edges absent in a biomedical knowledge graph and lifts separation to 2.30x and the discrimination gap to +0.392, at a 4.5% BIOSSES cost. On an Intel Xeon 6737P with AMX, OpenVINO cuts single-query latency from 1367 ms to 10 ms (133x) and reaches 555 sentences/sec. One finding contradicts standard advice: FP16 beats INT8 on this silicon at every serving batch size, and we explain why. The same model on a no-AMX Ice Lake instance runs 13-27x slower. We release the benchmark suite, training corpora, the BODHI generator, and the OpenVINO scripts.2026-06-08T15:54:28Z20 pages, 18 figures, 9 tablesSuraj BiswasSaurabh GuptaPritam Mukherjeehttp://arxiv.org/abs/2208.00778v2SFILES 2.0: An extended text-based flowsheet representation2026-06-08T14:45:34ZSFILES are a text-based notation for chemical process flowsheets. They were originally proposed by d'Anterroches (Process flow sheet generation & design through a group contribution approach) who was inspired by the text-based SMILES notation for molecules. The text-based format has several advantages compared to flowsheet images regarding the storage format, computational accessibility, and eventually for data analysis and processing. However, the original SFILES version cannot describe essential flowsheet configurations unambiguously, such as the distinction between top and bottom products. Neither is it capable of describing the control structure required for the safe and reliable operation of chemical processes. Also, there is no publicly available software for decoding or encoding chemical process topologies to SFILES. We propose the SFILES 2.0 with a complete description of the extended notation and naming conventions. Additionally, we provide open-source software for the automated conversion between flowsheet graphs and SFILES 2.0 strings. This way, we hope to encourage researchers and engineers to publish their flowsheet topologies as SFILES 2.0 strings. The ultimate goal is to set the standards for creating a FAIR database of chemical process flowsheets, which would be of great value for future data analysis and processing.2022-07-25T16:14:43ZOptimization and Engineering, Volume 24, pages 2911-2933, (2023)Gabriel VogelEdwin HirtreiterLukas Schulze BalhornArtur M. Schweidtmann10.1007/s11081-023-09798-9http://arxiv.org/abs/2604.01475v3Interpretable Electrophysiological Features of Resting-State EEG Capture Cortical Network Dynamics in Parkinsons Disease2026-06-08T13:23:37ZParkinsons disease (PD) alters cortical neural dynamics, yet reliable non-invasive electrophysiological biomarkers remain elusive. This study examined whether interpretable EEG features capturing complementary aspects of neural dynamics can discriminate Parkinsonian neural states. A comprehensive set of interpretable features was extracted and grouped into Standard descriptors (spectral power, phase synchronization, time-domain statistics) and Dynamical descriptors (aperiodic activity, cross-frequency coupling, scale-free dynamics, neuronal avalanche statistics, and instantaneous frequency measures). A multi-head attention transformer classifier was trained using strict LOSO validation. Group-level comparisons were performed to identify electrophysiological differences associated with disease and medication state. Standard feature sets achieved strongest performance in discriminating medication states (PDoff vs PDon), whereas Dynamical performed competitively in contrasts between PD patients and healthy controls. Random feature ablation analyses indicated that Dynamical descriptors provide complementary information distributed across features while correlation analysis revealed low redundancy within both feature sets. Group-level comparisons revealed medication-sensitive reductions in delta power and voltage variance, modulation of neuronal avalanche statistics, persistent increases in theta phase synchronization in PD patients, and disease-related alterations in cross-frequency interactions. Traditional spectral and synchronization features primarily reflect medication-related neural modulation, whereas dynamical descriptors reveal broader alterations in cortical network organization associated with disease but also with medication. These findings support multivariate EEG representations as a promising framework for developing non-invasive biomarkers of PD.2026-04-01T23:31:38Z28 pages; 6 Figures, 5 tables; 3 Supplementary Figures, 1 Supplementary Table; Original Research ReportAntonios G. Dougalishttp://arxiv.org/abs/2606.09952v1Adjusted trajectory of medication exposure taking into account the periodicity of dispensations and the number of dispensed packs and comparative analysis on EFEMERIS database2026-06-08T10:38:03ZWe presented an adjustment method for the calculation of medication exposure trajectories based on the number of dispensed packs and the type of dispensations (occasional or regular). A comparative study based on the EFEMERIS data was carried out using three different scenarios of trajectory calculation depending on whether or not the number of packs and the periodicity of medication dispensations were taken into account. The impact of the scenario was highlighted using global indicators on the number of Define-Daily Dose (DDD) on all women exposed; the study of changes in individual trajectories from one scenario to another was carried out; we also compared the results of a clustering into four groups. If 65% of the trajectories remained unchanged, we could observe on the rest significant changes in number of DDD and/or on individual exposure profile. We observed 4% of trajectories that were attributed to a different cluster, and the clustering was of better quality with the adjustment method. Depending on the study context, an impact on cluster distribution could be observed for some maternal characteristics and neonatal outcomes. This was the case for a higher occurrence of neonatal pathology for neonates from mothers belonging to the cluster with high doses of psychotropics, thus reinforcing the conclusions of previous studies of a link between high exposure to psychotropic medications and presence of pathology for the newborn.2026-06-08T10:38:03Z10 pages, 2 figures, 3 tablesonly, 2025, vol. 20, no 2, p. e0308767Cécile ChouquetAnna-Belle BeauChristine Damase-MichelDavid JeauneauIsabelle LacroixSabine Mercier10.1371/journal.pone.0308767http://arxiv.org/abs/2604.26498v3Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction2026-06-08T10:23:05ZThe rapid growth of molecular foundation models and large language models (LLMs) has encouraged a scale centred view of AI in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models. We test this assumption across 26 ADME, toxicity and bioactivity endpoints, covering 165,541 endpoint level compound label records. The benchmark contains 78 endpoint and split entries evaluated under random, Murcko scaffold and structure separated 5-fold cross validation protocols, representing increasing chemical generalization difficulty. Across 156 task and metric comparisons, classical machine learning (ML) provides the largest share of best performing entries (47.4%), followed by pretrained molecular sequence models (28.8%), graph neural networks (21.8%) and LLM based SAR baselines (1.9%). Classical ML dominates random split interpolation and remains the largest winner family overall. GNN and sequence models are competitive in selected harder splits, but their strict winner shares decrease under a fixed final-window readout, indicating sensitivity to training settings and model selection. Paired bootstrap analyses show that small numerical differences between individual models should not be read as decisive victories. SAR knowledge from training folds improves GPT5.5-SAR and Opus4.7-SAR metrics but does not make rule based reasoning a universal substitute for supervised predictors. Compact specialized models remain highly effective, and predictive performance depends on the fit among model, task and validation scenario, not on scale alone.2026-04-29T10:01:16ZImproved benchmark design and reproducibility, replaced restricted datasets with public benchmarks in primary analyses, and added sensitivity analyses supporting the interpretation of model scaling and evaluation protocol effects in molecular predictionJinjiang GuoSheng Dinghttp://arxiv.org/abs/2605.19579v2TACK: A Statistical Evaluation of Degradation Activity on a Novel TArgeting Chimeras Knowledge Dataset2026-06-08T07:51:28ZProteolysis-targeting chimeras (PROTACs) represent a promising therapeutic modality that induces targeted protein degradation by hijacking the ubiquitin-proteasome system. However, rational PROTAC design remains challenging due to the complex interplay between molecular structure, target proteins, E3 ligases, and the cellular context. We present TACK, a statistical evaluation of degradation activity on a novel TArgeting Chimeras Knowledge dataset of 3,514 PROTACs and 6,561 degradation endpoints aggregated from three major repositories with standardized molecular representations, protein annotations, and experimental conditions. Using scaffold-based 5x5 cross-validation, we perform a rigorous statistical comparison of three machine learning methods to predict PROTAC degradation activity across three tasks: $DC_{50}$ and Dmax regression, and binary activity classification. Feature ablation demonstrates that cellular context features and simple protein representations rival complex ESM protein embeddings, highlighting the importance of feature engineering over architectural sophistication. Models trained on the best performing features show that potency ($pDC_{50}$, $R^2=0.66$) is substantially more predictable than maximum degradation (Dmax, $R^2=0.36$). In activity prediction, statistical tests support that classical methods (XGBoost and MLP) significantly outperform PROTAC-STAN, a domain-specific graph neural network model (ROC-AUC: 0.85 vs. 0.75, p<0.001). Finally, we propose an ensemble-based uncertainty quantification approach showing that prediction variance correlates with prediction error ($pDC_{50}$: Spearman $ρ= 0.36$, p<0.001; Dmax: $ρ=0.69$, p<0.001), enabling confidence-aware experimental prioritization. Our findings challenge assumptions about specialized architectures for degradation prediction and provide evidence-based guidance for ML-driven PROTAC assessment.2026-05-19T09:22:30Z12 pages, 6 figures, accepted to Knowledge Discovery and Data Mining - KDD '26Stefano RibesNils DunlopRocío Mercado10.1145/3770855.3819052http://arxiv.org/abs/2606.08973v1A systematic investigation of molecular encoding methods for drug property predictions across neural network and Transformer encoder-based model2026-06-08T03:16:44ZFundamental investigations into how different molecular encoding methods affect molecular property prediction remain relatively limited. In this study, we extensively examined the optimal molecular encoding methods for molecular properties prediction using two prevalent structure designs: a classical neural network model (MLP) and a Transformer encoder-based model (MLP+TL). For molecular encoding methods, we investigated several types of fingerprints, including traditional topological fingerprints, substructure-based fingerprints, and string-based representations. These two models were trained on seven well-known molecular datasets to evaluate different input molecular encoding methods based on evaluation metrics. On several biologically relevant classification tasks, including toxicity, mutagenicity, and side-effect prediction, our models consistently achieved average AUC values above 0.9. Rather than relying on external post-hoc explanation methods such as the local interpretable model-agnostic explanation (LIME) or the Deep SHapley Additive exPlanations (SHAP), we leveraged the model's intrinsic attention weights as an internal interpretability signal for identifying potentially important feature. The MLP+TL model using MACCS and PubChem as input can capture chemically interpretable groups that determined the major blood-brain barrier (BBB) permeability and mutagenicity in Salmonella typhimurium. In particular, a comparison between Morphine and Heroin highlighted the role of hydroxyl-related substructures in BBB permeability prediction, which was consistently reflected in the attention weights. Overall, our findings provide practical guidance for selecting effective molecular encoding methods and contribute to the development of interpretable molecular informatics approaches for drug discovery.2026-06-08T03:16:44ZSheng-Ya ChenShan-Ju Yehhttp://arxiv.org/abs/2606.08897v1A multi-agent system for spine MRI report generation from multi-sequence imaging2026-06-08T00:50:07ZSpinal pathology is a leading cause of pain and disability worldwide. Spine MRI is central to clinical evaluation, yet its interpretation remains complex and time-consuming, requiring integration of information across multiple imaging sequences and anatomical regions. Despite recent advances in automated MRI analysis, effectively combining multi-sequence data while preserving sequence-specific diagnostic information remains an open challenge. Here we present SpineAgent, a multi-agent framework for spine MRI report generation built upon a multi-sequence foundation model trained on routine clinical data from 32,047 patients and 453,683 MRI series, comprising a total of 13,441,191 MRI slices. To accommodate diverse modalities of sequences, we first pre-train two DINOv3-based encoders separately on T1- and T2-weighted sequences. We then introduce a continual training strategy that learns a synthesizer to embed images of other sequences using the T1 and T2 encoders, producing patient-level embedding that integrates various signals across MRI sequences. Using these embeddings, SpineAgent achieves state-of-the-art performance, and demonstrates strong generalizability under cross-manufacturer and cross-cohort evaluation. Beyond classification, SpineAgent enables pathology localization by identifying findings-relevant slices and segmenting pathological regions. It also supports multimodal image-report retrieval, providing a solid foundation for scalable and explainable MRI report generation. We further integrate these validated capabilities of SpineAgent into 37 specialized agents. Finally, we incorporate their outputs as structured tokens within a Medical Report Agent trained end-to-end for report generation. Through both automated metrics and expert evaluation by five radiologists, SpineAgent achieves leading performance in spine MRI report generation.2026-06-08T00:50:07ZZhiping XiaoJunwei YangGongbo SunHan ZhangHanwen XuYi YaoZachary D. MillerWilliam E. KingMohammed M. KananiJalal B. AndreSammy ChuMing ZhangPaul E. KinahanNathan M. CrossSheng Wanghttp://arxiv.org/abs/2605.23169v2PRAXIS: Case-distilled and code-verified AI agents for biological research2026-06-08T00:48:39ZLarge language models are moving scientific research from text assistance toward agentic workflows, yet biological research requires strong object validation, methodological suitability, reproducibility, and auditability. Prompt engineering, general RAG, or tool use alone cannot reliably produce domain-specific scientific judgment. Here, we present PRAXIS, a verifiable biological research agent framework driven by literature learning and case distillation. PRAXIS converts research experience, failure boundaries, domain rules, and executable procedures into structured long-term memory. By coordinating successful cases, negative cases, rules, and skills, PRAXIS supports problem definition, object validation, method selection, workflow execution, result interpretation, and review feedback across diverse biocomputational tasks. We instantiated PRAXIS as an agent suite for biomedical computing and evaluated it through object validation, case retrieval, memory ablation, public benchmarks, and cross-agent workflows. The results show that case-based learning improves method selection, error suppression, and workflow organization in complex biological research tasks. Rather than replacing scientists, PRAXIS provides a general pathway for transforming research experience into executable, auditable, and transferable agent capabilities.2026-05-22T02:41:41ZZhenyu MaYuyang SongChunyi YangJingyi ZhuLimei XuMin XiaoXukai Jianghttp://arxiv.org/abs/2511.22519v2FoldSAE: Learning to Steer Protein Folding Through Sparse Representations2026-06-07T16:22:01ZRFdiffusion is a popular and well-established model for generation of protein structures. However, this generative process offers limited insight into its internal representations and how they contribute to the final protein structure. Concurrently, recent work in mechanistic interpretability has successfully used Sparse Autoencoders (SAEs) to discover interpretable features within neural networks. We combine these concepts by applying SAE to the internal representations of RFdiffusion to uncover secondary structure-specific features and establish a relationship between them and generated protein structures. Building on these insights, we introduce a novel steering mechanism that enables precise control of secondary structure formation through a tunable hyperparameter, while simultaneously revealing interpretable block and neuron-level representations within RFdiffusion. Our work pioneers a new framework for making RFdiffusion more interpretable, demonstrating how understanding internal features can be directly translated into precise control over the protein design process.2025-11-27T14:54:00Z15 pages, 1o figures, submitted to RECOMB 2026Wojciech ZarzeckiPaulina SzymczakEwa SzczurekKamil Deja