https://arxiv.org/api/E8OYY3SKTFLufyuC5b6RjSQCPno 2026-06-18T16:57:45Z 1596 345 15 http://arxiv.org/abs/2406.12108v2 Computing in the Life Sciences: From Early Algorithms to Modern AI 2024-06-19T03:54:28Z

Computing in the life sciences has undergone a transformative evolution, from early computational models in the 1950s to the applications of artificial intelligence (AI) and machine learning (ML) seen today. This paper highlights key milestones and technological advancements through the historical development of computing in the life sciences. The discussion includes the inception of computational models for biological processes, the advent of bioinformatics tools, and the integration of AI/ML in modern life sciences research. Attention is given to AI-enabled tools used in the life sciences, such as scientific large language models and bio-AI tools, examining their capabilities, limitations, and impact to biological risk. This paper seeks to clarify and establish essential terminology and concepts to ensure informed decision-making and effective communication across disciplines.

2024-06-17T21:36:52Z 53 pages, 4 figures, 10 tables Samuel A. Donkor Matthew E. Walsh Alexander J. Titus http://arxiv.org/abs/2405.05301v2 A design specification for Critical Illness Digital Twins to cure sepsis: responding to the National Academies of Sciences, Engineering and Medicine Report: Foundational Research Gaps and Future Directions for Digital Twins 2024-06-16T22:26:50Z

On December 15, 2023, The National Academies of Sciences, Engineering and Medicine (NASEM) released a report entitled: Foundational Research Gaps and Future Directions for Digital Twins. The ostensible purpose of this report was to bring some structure to the burgeoning field of digital twins by providing a working definition and a series of research challenges that need to be addressed to allow this technology to fulfill its full potential. In the work presented herein we focus on five specific findings from the NASEM Report: 1) definition of a Digital Twin, 2) using fit-for-purpose guidance, 3) developing novel approaches to Verification, Validation and Uncertainty Quantification (VVUQ) of Digital Twins, 4) incorporating control as an explicit purpose for a Digital Twin and 5) using a Digital Twin to guide data collection and sensor development, and describe how these findings are addressed through the design specifications for a Critical Illness Digital Twin (CIDT) aimed at curing sepsis.

2024-05-08T17:17:58Z 31 pages, 13 Figures, 1 Table Gary An Chase Cockrell http://arxiv.org/abs/2401.17411v2 Identification of spatial dynamic patterns of behavior using weighted Voronoi diagrams 2024-06-16T21:20:53Z

This study proposes an innovative approach to analyze spatial patterns of behavior by integrating information in weighted Voronoi diagrams. The objective of the research is to analyze the temporal distribution of an experimental subject in different regions of a given space, with the aim of identifying significant areas of interest. The methodology employed involves dividing the experimental space, determining representative points, and assigning weights based on the cumulative time the subject spends in each region. This process results in a set of generator points along with their respective weights, thus defining the Voronoi diagram. The study also presents a detailed and advanced perspective for understanding spatial behavioral patterns in experimental contexts.

2024-01-30T20:00:49Z 10 pages, 4 figures, 2 tables, Submitted to 16th Mexican Conference on Pattern Recognition 2024 Martha Lorena Avendaño-Garrido Carlos Alberto Hernández-Linares Brenda Zarahí Medina-Pérez Varsovia Hernández Porfirio Toledo Alejandro León 10.1007/978-3-031-62836-8_1 http://arxiv.org/abs/2406.10696v1 Mining comorbidities: a brief survey 2024-06-15T17:31:43Z

In this manuscript we will present a brief overview of the comorbidity concept. We will start by laying its foundations and its definitions and then describing the role that machine learning can hold in mining and defining it. The purpose of this short survey is to present a brief overview of the definition of comorbidity as a concept, and showing some of the latest applications and potentialities for the application of natural language processing and text mining techniques.

2024-06-15T17:31:43Z Giovanna Maria Dimitri http://arxiv.org/abs/2403.15274v2 Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review 2024-06-12T15:50:31Z

The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.

2024-03-22T15:16:23Z Peer-reviewed and accepted by Quantitative Biology Jinge Wang Zien Cheng Qiuming Yao Li Liu Dong Xu Gangqing Hu http://arxiv.org/abs/2406.06826v1 Experimental Measurement of Assembly Indices are Required to Determine The Threshold for Life 2024-06-10T22:19:20Z

Assembly Theory (AT) was developed to help distinguish living from non-living systems. The theory is simple as it posits that the amount of selection or Assembly is a function of the number of complex objects where their complexity can be objectively determined using assembly indices. The assembly index of a given object relates to the number of recursive joining operations required to build that object and can be not only rigorously defined mathematically but can be experimentally measured. In pervious work we outlined the theoretical basis, but also extensive experimental measurements that demonstrated the predictive power of AT. These measurements showed that is a threshold in assembly indices for organic molecules whereby abiotic chemical systems could not randomly produce molecules with an assembly index greater or equal than 15. In a recent paper by Hazen et al [1] the authors not only confused the concept of AT with the algorithms used to calculate assembly indices, but also attempted to falsify AT by calculating theoretical assembly indices for objects made from inorganic building blocks. A fundamental misunderstanding made by the authors is that the threshold is a requirement of the theory, rather than experimental observation. This means that exploration of inorganic assembly indices similarly requires an experimental observation, correlated with the theoretical calculations. Then and only then can the exploration of complex inorganic molecules be done using AT and the threshold for living systems, as expressed with such building blocks, be determined. Since Hazen et al.[1] present no experimental measurements of assembly theory, their analysis is not falsifiable.

2024-06-10T22:19:20Z 6 pages, 11 references Sara I. Walker Cole Mathis Stuart Marshall Leroy Cronin http://arxiv.org/abs/2406.05258v1 Advances in Machine Learning, Statistical Methods, and AI for Single-Cell RNA Annotation Using Raw Count Matrices in scRNA-seq Data 2024-06-07T21:05:56Z

Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to analyze gene expression at the resolution of individual cells, providing unprecedented insights into cellular heterogeneity and complex biological systems. This paper reviews various advanced computational and machine learning techniques tailored for the analysis of scRNA-seq data, emphasizing their roles in different stages of the data processing pipeline.

2024-06-07T21:05:56Z A survey of best practices for using machine learning, statistical methods, and AI for Single-Cell RNA annotation using raw count matrices in scRNA-seq data Megha Patel Nimish Magre Himanshi Motwani Nik Bear Brown http://arxiv.org/abs/2406.05170v1 Research on Tumors Segmentation based on Image Enhancement Method 2024-06-07T12:25:04Z

One of the most effective ways to treat liver cancer is to perform precise liver resection surgery, the key step of which includes precise digital image segmentation of the liver and its tumor. However, traditional liver parenchymal segmentation techniques often face several challenges in performing liver segmentation: lack of precision, slow processing speed, and computational burden. These shortcomings limit the efficiency of surgical planning and execution. In this work, the model initially describes in detail a new image enhancement algorithm that enhances the key features of an image by adaptively adjusting the contrast and brightness of the image. Then, a deep learning-based segmentation network was introduced, which was specially trained on the enhanced images to optimize the detection accuracy of tumor regions. In addition, multi-scale analysis techniques have been incorporated into the study, allowing the model to analyze images at different resolutions to capture more nuanced tumor features. In the presentation of the experimental results, the study used the 3Dircadb dataset to test the effectiveness of the proposed method. The experimental results show that compared with the traditional image segmentation method, the new method using image enhancement technology has significantly improved the accuracy and recall rate of tumor identification.

2024-06-07T12:25:04Z Danyi Huang Ziang Liu Yizhou Li http://arxiv.org/abs/2406.00993v1 Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology 2024-06-03T05:10:37Z

With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases.

2024-06-03T05:10:37Z 9 pages, 14 figures Jiaming Wei Tong Liu Jipeng Huang Xiaowei Li Yurui Qi Gangyin Luo http://arxiv.org/abs/2405.19936v1 Free-ranging dogs quickly learn to recognize a rewarding person 2024-05-30T10:56:32Z

Individual human recognition is important for species that live in close proximity to humans. Numerous studies on domesticated species and urban-adapted birds have highlighted this ability. One such species which is heavily reliant on humans is the free-ranging dog. Very little knowledge exists on the amount of time taken by free-ranging dogs to learn and remember individual humans. Due to their territorial nature, they have a high probability of encountering the same people multiple times on the streets. Being able to distinguish individual humans might be helpful in making decisions regarding people from whom to beg for food or social reward. We investigated if free-ranging dogs are capable of identifying the person rewarding them and the amount of time required for them to learn it. We conducted field trials on randomly selected adult free-ranging dogs in West Bengal, India. On Day 1, a choice test was conducted. The experimenter chosen did not provide reward while the other experimenter provided a piece of boiled chicken followed by petting. The person giving reward on Day 1 served as the correct choice on four subsequent days of training. Day 6 was the test day when none of the experimenters had a reward. We analyzed the choice made by the dogs, the time taken to approach during the choice tests, and the socialization index, which was calculated based on the intensity of affiliative behaviour shown towards the experimenters. The dogs made correct choices at a significantly higher rate on the fifth and sixth days, as compared to Day 2, suggesting learning. This is the first study aiming to understand the time taken for individual human recognition in free-ranging dogs and can serve as the scaffold for future studies to understand the dog-human relationship in open environments, like urban ecosystems.

2024-05-30T10:56:32Z Srijaya Nandi Mousumi Chakraborty Aesha Lahiri Hindolii Gope Sujata Khan Bhaduri Anindita Bhadra http://arxiv.org/abs/2405.19857v1 Biodiversity data standards for the organization and dissemination of complex research projects and digital twins: a guide 2024-05-30T09:04:40Z

Biodiversity data are substantially increasing, spurred by technological advances and community (citizen) science initiatives. To integrate data is, likewise, becoming more commonplace. Open science promotes open sharing and data usage. Data standardization is an instrument for the organization and integration of biodiversity data, which is required for complex research projects and digital twins. However, just like with an actual instrument, there is a learning curve to understanding the data standards field. Here we provide a guide, for data providers and data users, on the logistics of compiling and utilizing biodiversity data. We emphasize data standards, because they are integral to data integration. Three primary avenues for compiling biodiversity data are compared, explaining the importance of research infrastructures for coordinated long-term data aggregation. We exemplify the Biodiversity Digital Twin (BioDT) as a case study. Four approaches to data standardization are presented in terms of the balance between practical constraints and the advancement of the data standards field. We aim for this paper to guide and raise awareness of the existing issues related to data standardization, and especially how data standards are key to data interoperability, i.e., machine accessibility. The future is promising for computational biodiversity advancements, such as with the BioDT project, but it rests upon the shoulders of machine actionability and readability, and that requires data standards for computational communication.

2024-05-30T09:04:40Z 42 pages, 2 figures, 1 box, 1 table Carrie Andrew Sharif Islam Claus Weiland Dag Endresen http://arxiv.org/abs/2405.19180v1 Observation of Significant Photosynthesis in Garden Cress and Cyanobacteria under Simulated Illumination from a K Dwarf Star 2024-05-29T15:21:45Z

Stars with about 45 to 80% the mass of the Sun, so-called K dwarf stars, have previously been proposed as optimal host stars in the search for habitable extrasolar worlds. These stars are abundant, have stable luminosities over billions of years longer than Sun-like stars, and offer favorable space environmental conditions. So far, the theoretical and experimental focus on exoplanet habitability has been on even less massive, though potentially less hospitable red dwarf stars. Here we present the first experimental data on the responses of photosynthetic organisms to a simulated K dwarf spectrum. We find that garden cress Lepidium sativum under K-dwarf radiation exhibits comparable growth and photosynthetic efficiency as under solar illumination on Earth. The cyanobacterium Chroococcidiopsis sp. CCMEE 029 exhibits significantly higher photosynthetic efficiency and culture growth under K dwarf radiation compared to solar conditions. Our findings of the affirmative responses of these two photosynthetic organisms to K dwarf radiation suggest that exoplanets in the habitable zones around such stars deserve high priority in the search for extrasolar life.

2024-05-29T15:21:45Z International Journal of Astrobiology 23 (2024) e18 Iva Vilović Dirk Schulze-Makuch René Heller 10.1017/S1473550424000132 http://arxiv.org/abs/2405.14904v1 Large deviation principles and evolutionary multiple structure alignment of non-coding RNA 2024-05-22T23:08:40Z

Non-coding RNA are functional molecules that are not translated into proteins. Their function comes as important regulators of biological function. Because they are not translated, they need not be as stable as other types of RNA. The TKF91 Structure Tree from Holmes 2004 is a probability model that effectively describes correlated substitution, insertion, and deletion of base pairs, and found to have some worth in understanding dynamic folding patterns. In this paper, we provide a new probabilistic analysis of the TKF91 Structure Tree. Large deviation principles on stem lengths, helix lengths, and tree size are proved. Additionally, we give a new alignment procedure that constructs accurate sequence and structural alignments for sequences with low identity for a dense enough phylogeny.

2024-05-22T23:08:40Z 25 pages main document, 31 pages total with references and appendix, 1 figure Brandon Legried http://arxiv.org/abs/2405.11009v1 Petri nets in modelling glucose regulating processes in the liver 2024-05-17T13:15:01Z

Diabetes is a chronic condition, considered one of the civilization diseases, that is characterized by sustained high blood sugar levels. There is no doubt that more and more people is going to suffer from diabetes, hence it is crucial to understand better its biological foundations. The essential processes related to the control of glucose levels in the blood are: glycolysis (process of breaking down of glucose) and glucose synthesis, both taking place in the liver. The glycolysis occurs during feeding and it is stimulated by insulin. On the other hand, the glucose synthesis arises during fasting and it is stimulated by glucagon. In the paper we present a Petri net model of glycolysis and glucose synthesis in the liver. The model is created based on medical literature. Standard Petri nets techniques are used to analyse the properties of the model: traps, reachability graphs, tokens dynamics, deadlocks analysis. The results are described in the paper. Our analysis shows that the model captures the interactions between different enzymes and substances, which is consistent with the biological processes occurring during fasting and feeding. The model constitutes the first element of our long-time goal to create the whole body model of the glucose regulation in a healthy human and a person with diabetes.

2024-05-17T13:15:01Z submitted to International Workshop on Petri Nets and Software Engineering (PNSE 2024) Kamila Barylska Anna Gogolińska http://arxiv.org/abs/2405.09595v1 Simplicity within biological complexity 2024-05-15T13:32:45Z

Heterogeneous, interconnected, systems-level, molecular data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics. It will lead to a paradigm shift in computational and biomedical understanding of data and diseases that will open up ways to solving some of the major bottlenecks in precision medicine and other domains.

2024-05-15T13:32:45Z 29 pages, 4 figures Natasa Przulj Noel Malod-Dognin