Table of Contents
The application of attention mechanisms in biological sequence analysis
In recent years, attention mechanisms have gained widespread application and significant advancements in the field of biological sequence analysis. This paper systematically summarizes the fundamental principles of attention mechanisms and their latest ...
More.In recent years, attention mechanisms have gained widespread application and significant advancements in the field of biological sequence analysis. This paper systematically summarizes the fundamental principles of attention mechanisms and their latest research progress in biological sequence analysis. First, the development history of attention mechanisms is introduced, with a focus on classic mechanisms such as self-attention, cross-attention, and multi-head attention, along with their improved variants. Next, a brief overview of the classification and characteristics of bioinformatics databases is provided. Subsequently, the application of attention mechanisms in the analysis of DNA, RNA, and protein sequences is highlighted. In the realm of DNA sequence analysis, attention mechanisms have been applied to tasks such as epigenetic analysis and regulatory element identification; in RNA sequence analysis, they play a crucial role in single-cell RNA sequencing, RNA function prediction, and structure prediction; in protein sequence analysis, attention mechanisms are widely used in protein classification, function prediction, structure prediction, site prediction, and interaction prediction. Furthermore, this paper summarizes the applications of attention mechanisms in other biological sequence analysis tasks, such as multi-omics analysis and enzyme analysis. The attention mechanisms can significantly improve the accuracy, interpretability, and computational efficiency of biological sequence analysis, providing powerful computational tools for bioinformatics research.
Less.Yingyue Tang, Wenzheng Bao
DOI:https://doi.org/10.70401/cbm.2026.0014 - May 15, 2026
PIONEER: A structure-informed graph neural network for PE/PPE protein identification
Aims: The Pro-Glu (PE) and Pro-Pro-Glu (PPE) protein family of Mycobacterium tuberculosis plays a critical role in virulence, immune evasion, and host-pathogen interactions. However, the high guanine-cytosine-content and repetitive ...
More.Aims: The Pro-Glu (PE) and Pro-Pro-Glu (PPE) protein family of Mycobacterium tuberculosis plays a critical role in virulence, immune evasion, and host-pathogen interactions. However, the high guanine-cytosine-content and repetitive sequences of these proteins have long hindered accurate gene identification and functional annotation. This study aims to develop an effective computational framework to improve the identification of PE/PPE proteins.
Methods: We propose PIONEER, a structure-aware deep learning framework that integrates embeddings from the pre-trained protein language model ESM Cambrian (ESMC) with structural features. PIONEER represents proteins as residue-level graphs that encode both sequence semantics and three-dimensional topological structure, enabling effective modelling of hierarchical geometric relationships.
Results: Benchmarking demonstrates that PIONEER outperforms 16 traditional machine learning algorithms and the existing deep learning model across multiple evaluation metrics, including accuracy, Matthew’s correlation coefficient, and F1 scores. Ablation experiments confirm the complementary contributions of ESMC embeddings and secondary structure features to model performance. The t-SNE-based visualization results reveal the contributions of features across different network layers to the identification of PE/PPE proteins.
Conclusion: PIONEER improves the accuracy of PE/PPE protein identification by integrating sequence and structural information within a structure-aware graph learning framework. This method provides an effective computational tool for functional annotation, investigation of pathogenicity mechanisms, and vaccine target discovery in M. tb.
Less.Heyun Sun, ... Fuyi Li
DOI:https://doi.org/10.70401/cbm.2026.0016 - May 11, 2026
Distilling genomic knowledge into pathology slides for robust cancer survival prediction
Aims: To develop a robust and clinically feasible framework for cancer survival prediction using only histopathology images while leveraging transcriptomic knowledge during training.
Methods: The study proposed Adaptive Multi-modality ...
More.Aims: To develop a robust and clinically feasible framework for cancer survival prediction using only histopathology images while leveraging transcriptomic knowledge during training.
Methods: The study proposed Adaptive Multi-modality Knowledge Distillation (AMKD), a framework designed to transfer complementary molecular-level information from transcriptomic data to pathology-based models. The AMKD framework consists of two essential elements. First, a gene-guided pathology enhancement module is designed to inject genomics-aware information from a multimodal teacher into pathology features. Second, an adaptive redundancy reduction loss is introduced to regulate knowledge distillation by accounting for prediction discrepancies between teacher and student models. This design allows the student model to retain biologically meaningful knowledge during training and remain effective with only histopathology data at inference.
Results: Comprehensive experiments on four The Cancer Genome Atlas (TCGA) cancer cohorts demonstrate that AMKD achieves
Conclusion: The proposed AMKD framework provides a clinically practical solution for robust cancer survival analysis when transcriptomic data are unavailable. By adaptively distilling multi-modal knowledge into a pathology-based model, AMKD bridges the gap between research and clinical applicability, enabling scalable and cost-effective prognostic prediction in real-world settings.
Less.Yangfan Xu, ... Runming Wang
DOI:https://doi.org/10.70401/cbm.2026.0015 - April 29, 2026
A deep learning framework with positional attention for modeling enhancer-promoter interactions
Aims: Since distal enhancers are involved in regulating target genes through physical contacting with proximal promoters, identifying enhancer-promoter interactions (EPIs) is critical to deepening our understanding of gene expression. However, ...
More.Aims: Since distal enhancers are involved in regulating target genes through physical contacting with proximal promoters, identifying enhancer-promoter interactions (EPIs) is critical to deepening our understanding of gene expression. However, high-throughput experimental methods for identifying EPIs are time-consuming and expensive. Therefore, computational methods for predicting EPIs would be valuable and important, but also face a lot of challenges.
Methods: In this paper, we propose a novel deep learning-based method, namely EPIPAM, to predict EPIs only using genomic sequences. EPIPAM firstly uses a deep convolutional neural network to extract high-level sequence features, and then uses a position attention mechanism to compute the positional correlation coefficients of two subregions separately coming from enhancers and promoters, aiming to focus on important regions of them.
Results: Benchmarking comparisons on six different cell lines show that EPIPAM performs better than the state-of-the-art methods in the task of EPIs prediction. More importantly, we notice that, almost without exception, the predictive performance of all methods is really poor once applying a strategy of splitting training and test data by chromosome. Therefore, we explain the possible reason that leads to this situation by systematically exploring the structure of EPI datasets, and indirectly analyze the difficulty of predicting EPIs only using genomic sequences through ChIA-PET contact datasets.
Conclusion: This study presents a novel deep learning-based method to predict EPIs only using genomic sequences. Although the proposed method achieves higher predictive accuracy, it suffers from several limitations, such as highly selective matching bias, negative sample selection issues, and constraints of pre-trained vectors.
Less.Liping Liu, ... Qinhu Zhang
DOI:https://doi.org/10.70401/cbm.2026.0013 - April 08, 2026
Multi-class pattern discovery for bacterial secretory effectors
Aims: EffecTri aims to develop a comprehensive, multi-class prediction framework to accurately identify bacterial effector proteins secreted by Type III, IV, and VI secretion systems. Current methodologies often employ binary classifications, ...
More.Aims: EffecTri aims to develop a comprehensive, multi-class prediction framework to accurately identify bacterial effector proteins secreted by Type III, IV, and VI secretion systems. Current methodologies often employ binary classifications, overlooking the complexity and interactions among multiple effector classes.
Methods: EffecTri integrates deep contextual embeddings from Evolutionary Scale Modeling and handcrafted descriptors, including Amino Acid Composition and Dipeptide Composition. The performance of the model was rigorously evaluated through comparative descriptor analyses and optimized feature combinations, complemented by Uniform Manifold Approximation and Projection visualization for interpretability.
Results: EffecTri outperformed traditional machine learning methods, achieving a weighted F1-score of 0.850 on an independent test dataset. The fusion of Evolutionary Scale Modeling embeddings with handcrafted descriptors demonstrated superior predictive performance, clearly distinguishing effector classes in UMAP visualizations.
Conclusion: EffecTri represents a robust, interpretable, and accurate computational tool, enhancing the multi-class identification of bacterial secretory effectors and contributing valuable insights into bacterial pathogenic mechanisms.
Less.Jing Li, ... Youyu Wang
DOI:https://doi.org/10.70401/cbm.2026.0012 - March 05, 2026