https://arxiv.org/api/E8OYY3SKTFLufyuC5b6RjSQCPno2026-06-18T16:57:45Z159634515http://arxiv.org/abs/2406.12108v2Computing in the Life Sciences: From Early Algorithms to Modern AI2024-06-19T03:54:28ZComputing in the life sciences has undergone a transformative evolution, from early computational models in the 1950s to the applications of artificial intelligence (AI) and machine learning (ML) seen today. This paper highlights key milestones and technological advancements through the historical development of computing in the life sciences. The discussion includes the inception of computational models for biological processes, the advent of bioinformatics tools, and the integration of AI/ML in modern life sciences research. Attention is given to AI-enabled tools used in the life sciences, such as scientific large language models and bio-AI tools, examining their capabilities, limitations, and impact to biological risk. This paper seeks to clarify and establish essential terminology and concepts to ensure informed decision-making and effective communication across disciplines.2024-06-17T21:36:52Z53 pages, 4 figures, 10 tablesSamuel A. DonkorMatthew E. WalshAlexander J. Titushttp://arxiv.org/abs/2405.05301v2A design specification for Critical Illness Digital Twins to cure sepsis: responding to the National Academies of Sciences, Engineering and Medicine Report: Foundational Research Gaps and Future Directions for Digital Twins2024-06-16T22:26:50ZOn December 15, 2023, The National Academies of Sciences, Engineering and Medicine (NASEM) released a report entitled: Foundational Research Gaps and Future Directions for Digital Twins. The ostensible purpose of this report was to bring some structure to the burgeoning field of digital twins by providing a working definition and a series of research challenges that need to be addressed to allow this technology to fulfill its full potential. In the work presented herein we focus on five specific findings from the NASEM Report: 1) definition of a Digital Twin, 2) using fit-for-purpose guidance, 3) developing novel approaches to Verification, Validation and Uncertainty Quantification (VVUQ) of Digital Twins, 4) incorporating control as an explicit purpose for a Digital Twin and 5) using a Digital Twin to guide data collection and sensor development, and describe how these findings are addressed through the design specifications for a Critical Illness Digital Twin (CIDT) aimed at curing sepsis.2024-05-08T17:17:58Z31 pages, 13 Figures, 1 TableGary AnChase Cockrellhttp://arxiv.org/abs/2401.17411v2Identification of spatial dynamic patterns of behavior using weighted Voronoi diagrams2024-06-16T21:20:53ZThis study proposes an innovative approach to analyze spatial patterns of behavior by integrating information in weighted Voronoi diagrams. The objective of the research is to analyze the temporal distribution of an experimental subject in different regions of a given space, with the aim of identifying significant areas of interest. The methodology employed involves dividing the experimental space, determining representative points, and assigning weights based on the cumulative time the subject spends in each region. This process results in a set of generator points along with their respective weights, thus defining the Voronoi diagram. The study also presents a detailed and advanced perspective for understanding spatial behavioral patterns in experimental contexts.2024-01-30T20:00:49Z10 pages, 4 figures, 2 tables, Submitted to 16th Mexican Conference on Pattern Recognition 2024Martha Lorena Avendaño-GarridoCarlos Alberto Hernández-LinaresBrenda Zarahí Medina-PérezVarsovia HernándezPorfirio ToledoAlejandro León10.1007/978-3-031-62836-8_1http://arxiv.org/abs/2406.10696v1Mining comorbidities: a brief survey2024-06-15T17:31:43ZIn this manuscript we will present a brief overview of the comorbidity concept. We will start by laying its foundations and its definitions and then describing the role that machine learning can hold in mining and defining it. The purpose of this short survey is to present a brief overview of the definition of comorbidity as a concept, and showing some of the latest applications and potentialities for the application of natural language processing and text mining techniques.2024-06-15T17:31:43ZGiovanna Maria Dimitrihttp://arxiv.org/abs/2403.15274v2Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review2024-06-12T15:50:31ZThe year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.2024-03-22T15:16:23ZPeer-reviewed and accepted by Quantitative BiologyJinge WangZien ChengQiuming YaoLi LiuDong XuGangqing Huhttp://arxiv.org/abs/2406.06826v1Experimental Measurement of Assembly Indices are Required to Determine The Threshold for Life2024-06-10T22:19:20ZAssembly Theory (AT) was developed to help distinguish living from non-living systems. The theory is simple as it posits that the amount of selection or Assembly is a function of the number of complex objects where their complexity can be objectively determined using assembly indices. The assembly index of a given object relates to the number of recursive joining operations required to build that object and can be not only rigorously defined mathematically but can be experimentally measured. In pervious work we outlined the theoretical basis, but also extensive experimental measurements that demonstrated the predictive power of AT. These measurements showed that is a threshold in assembly indices for organic molecules whereby abiotic chemical systems could not randomly produce molecules with an assembly index greater or equal than 15. In a recent paper by Hazen et al [1] the authors not only confused the concept of AT with the algorithms used to calculate assembly indices, but also attempted to falsify AT by calculating theoretical assembly indices for objects made from inorganic building blocks. A fundamental misunderstanding made by the authors is that the threshold is a requirement of the theory, rather than experimental observation. This means that exploration of inorganic assembly indices similarly requires an experimental observation, correlated with the theoretical calculations. Then and only then can the exploration of complex inorganic molecules be done using AT and the threshold for living systems, as expressed with such building blocks, be determined. Since Hazen et al.[1] present no experimental measurements of assembly theory, their analysis is not falsifiable.2024-06-10T22:19:20Z6 pages, 11 referencesSara I. WalkerCole MathisStuart MarshallLeroy Croninhttp://arxiv.org/abs/2406.05258v1Advances in Machine Learning, Statistical Methods, and AI for Single-Cell RNA Annotation Using Raw Count Matrices in scRNA-seq Data2024-06-07T21:05:56ZSingle-cell RNA sequencing (scRNA-seq) has revolutionized our ability to analyze gene expression at the resolution of individual cells, providing unprecedented insights into cellular heterogeneity and complex biological systems. This paper reviews various advanced computational and machine learning techniques tailored for the analysis of scRNA-seq data, emphasizing their roles in different stages of the data processing pipeline.2024-06-07T21:05:56ZA survey of best practices for using machine learning, statistical methods, and AI for Single-Cell RNA annotation using raw count matrices in scRNA-seq dataMegha PatelNimish MagreHimanshi MotwaniNik Bear Brownhttp://arxiv.org/abs/2406.05170v1Research on Tumors Segmentation based on Image Enhancement Method2024-06-07T12:25:04ZOne of the most effective ways to treat liver cancer is to perform precise liver resection surgery, the key step of which includes precise digital image segmentation of the liver and its tumor. However, traditional liver parenchymal segmentation techniques often face several challenges in performing liver segmentation: lack of precision, slow processing speed, and computational burden. These shortcomings limit the efficiency of surgical planning and execution. In this work, the model initially describes in detail a new image enhancement algorithm that enhances the key features of an image by adaptively adjusting the contrast and brightness of the image. Then, a deep learning-based segmentation network was introduced, which was specially trained on the enhanced images to optimize the detection accuracy of tumor regions. In addition, multi-scale analysis techniques have been incorporated into the study, allowing the model to analyze images at different resolutions to capture more nuanced tumor features. In the presentation of the experimental results, the study used the 3Dircadb dataset to test the effectiveness of the proposed method. The experimental results show that compared with the traditional image segmentation method, the new method using image enhancement technology has significantly improved the accuracy and recall rate of tumor identification.2024-06-07T12:25:04ZDanyi HuangZiang LiuYizhou Lihttp://arxiv.org/abs/2406.00993v1Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology2024-06-03T05:10:37ZWith the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases.2024-06-03T05:10:37Z9 pages, 14 figuresJiaming WeiTong LiuJipeng HuangXiaowei LiYurui QiGangyin Luohttp://arxiv.org/abs/2405.19936v1Free-ranging dogs quickly learn to recognize a rewarding person2024-05-30T10:56:32ZIndividual human recognition is important for species that live in close proximity to humans. Numerous studies on domesticated species and urban-adapted birds have highlighted this ability. One such species which is heavily reliant on humans is the free-ranging dog. Very little knowledge exists on the amount of time taken by free-ranging dogs to learn and remember individual humans. Due to their territorial nature, they have a high probability of encountering the same people multiple times on the streets. Being able to distinguish individual humans might be helpful in making decisions regarding people from whom to beg for food or social reward. We investigated if free-ranging dogs are capable of identifying the person rewarding them and the amount of time required for them to learn it. We conducted field trials on randomly selected adult free-ranging dogs in West Bengal, India. On Day 1, a choice test was conducted. The experimenter chosen did not provide reward while the other experimenter provided a piece of boiled chicken followed by petting. The person giving reward on Day 1 served as the correct choice on four subsequent days of training. Day 6 was the test day when none of the experimenters had a reward. We analyzed the choice made by the dogs, the time taken to approach during the choice tests, and the socialization index, which was calculated based on the intensity of affiliative behaviour shown towards the experimenters. The dogs made correct choices at a significantly higher rate on the fifth and sixth days, as compared to Day 2, suggesting learning. This is the first study aiming to understand the time taken for individual human recognition in free-ranging dogs and can serve as the scaffold for future studies to understand the dog-human relationship in open environments, like urban ecosystems.2024-05-30T10:56:32ZSrijaya NandiMousumi ChakrabortyAesha LahiriHindolii GopeSujata Khan BhaduriAnindita Bhadrahttp://arxiv.org/abs/2405.19857v1Biodiversity data standards for the organization and dissemination of complex research projects and digital twins: a guide2024-05-30T09:04:40ZBiodiversity data are substantially increasing, spurred by technological advances and community (citizen) science initiatives. To integrate data is, likewise, becoming more commonplace. Open science promotes open sharing and data usage. Data standardization is an instrument for the organization and integration of biodiversity data, which is required for complex research projects and digital twins. However, just like with an actual instrument, there is a learning curve to understanding the data standards field. Here we provide a guide, for data providers and data users, on the logistics of compiling and utilizing biodiversity data. We emphasize data standards, because they are integral to data integration. Three primary avenues for compiling biodiversity data are compared, explaining the importance of research infrastructures for coordinated long-term data aggregation. We exemplify the Biodiversity Digital Twin (BioDT) as a case study. Four approaches to data standardization are presented in terms of the balance between practical constraints and the advancement of the data standards field. We aim for this paper to guide and raise awareness of the existing issues related to data standardization, and especially how data standards are key to data interoperability, i.e., machine accessibility. The future is promising for computational biodiversity advancements, such as with the BioDT project, but it rests upon the shoulders of machine actionability and readability, and that requires data standards for computational communication.2024-05-30T09:04:40Z42 pages, 2 figures, 1 box, 1 tableCarrie AndrewSharif IslamClaus WeilandDag Endresenhttp://arxiv.org/abs/2405.19180v1Observation of Significant Photosynthesis in Garden Cress and Cyanobacteria under Simulated Illumination from a K Dwarf Star2024-05-29T15:21:45ZStars with about 45 to 80% the mass of the Sun, so-called K dwarf stars, have previously been proposed as optimal host stars in the search for habitable extrasolar worlds. These stars are abundant, have stable luminosities over billions of years longer than Sun-like stars, and offer favorable space environmental conditions. So far, the theoretical and experimental focus on exoplanet habitability has been on even less massive, though potentially less hospitable red dwarf stars. Here we present the first experimental data on the responses of photosynthetic organisms to a simulated K dwarf spectrum. We find that garden cress Lepidium sativum under K-dwarf radiation exhibits comparable growth and photosynthetic efficiency as under solar illumination on Earth. The cyanobacterium Chroococcidiopsis sp. CCMEE 029 exhibits significantly higher photosynthetic efficiency and culture growth under K dwarf radiation compared to solar conditions. Our findings of the affirmative responses of these two photosynthetic organisms to K dwarf radiation suggest that exoplanets in the habitable zones around such stars deserve high priority in the search for extrasolar life.2024-05-29T15:21:45ZInternational Journal of Astrobiology 23 (2024) e18Iva VilovićDirk Schulze-MakuchRené Heller10.1017/S1473550424000132http://arxiv.org/abs/2405.14904v1Large deviation principles and evolutionary multiple structure alignment of non-coding RNA2024-05-22T23:08:40ZNon-coding RNA are functional molecules that are not translated into proteins. Their function comes as important regulators of biological function. Because they are not translated, they need not be as stable as other types of RNA. The TKF91 Structure Tree from Holmes 2004 is a probability model that effectively describes correlated substitution, insertion, and deletion of base pairs, and found to have some worth in understanding dynamic folding patterns. In this paper, we provide a new probabilistic analysis of the TKF91 Structure Tree. Large deviation principles on stem lengths, helix lengths, and tree size are proved. Additionally, we give a new alignment procedure that constructs accurate sequence and structural alignments for sequences with low identity for a dense enough phylogeny.2024-05-22T23:08:40Z25 pages main document, 31 pages total with references and appendix, 1 figureBrandon Legriedhttp://arxiv.org/abs/2405.11009v1Petri nets in modelling glucose regulating processes in the liver2024-05-17T13:15:01ZDiabetes is a chronic condition, considered one of the civilization diseases, that is characterized by sustained high blood sugar levels. There is no doubt that more and more people is going to suffer from diabetes, hence it is crucial to understand better its biological foundations. The essential processes related to the control of glucose levels in the blood are: glycolysis (process of breaking down of glucose) and glucose synthesis, both taking place in the liver. The glycolysis occurs during feeding and it is stimulated by insulin. On the other hand, the glucose synthesis arises during fasting and it is stimulated by glucagon. In the paper we present a Petri net model of glycolysis and glucose synthesis in the liver. The model is created based on medical literature. Standard Petri nets techniques are used to analyse the properties of the model: traps, reachability graphs, tokens dynamics, deadlocks analysis. The results are described in the paper. Our analysis shows that the model captures the interactions between different enzymes and substances, which is consistent with the biological processes occurring during fasting and feeding. The model constitutes the first element of our long-time goal to create the whole body model of the glucose regulation in a healthy human and a person with diabetes.2024-05-17T13:15:01Zsubmitted to International Workshop on Petri Nets and Software Engineering (PNSE 2024)Kamila BarylskaAnna Gogolińskahttp://arxiv.org/abs/2405.09595v1Simplicity within biological complexity2024-05-15T13:32:45ZHeterogeneous, interconnected, systems-level, molecular data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics. It will lead to a paradigm shift in computational and biomedical understanding of data and diseases that will open up ways to solving some of the major bottlenecks in precision medicine and other domains.2024-05-15T13:32:45Z29 pages, 4 figuresNatasa PrzuljNoel Malod-Dognin