GILPI: Graphlet Interaction - based lncRNA-Protein Interaction Prediction

Authors: Hong-Yi Zhang; Yan Zhou
DIN
IJOER-JUL-2024-1
Abstract

Identification of lncRNA-protein interactions is important for understanding the biological functions and molecular mechanisms of lncRNAs. In this study, we proposed a computational model for predicting lncRNA-protein interactions based on Graphlet interactions to find potential LPIs (GILPI). First, five LPI datasets were collected. Second, vector features of lncRNAs and proteins were extracted from the sequence data by pyfeat and BioTriangle, respectively. Third, these features were subjected to Pearson's correlation coefficient to calculate the similarity between lncRNAs and the similarity between proteins. Fourth, the Jaccard similarity between lncRNAs and proteins was calculated based on the LPI network, and then the corresponding Pearson similarity and Jaccard similarity were taken as the average value of the final lncRNA-lncRNA similarity and protein-protein similarity to construct the network. Finally, lncRNA-protein classification prediction was performed on both networks. Comparing GILPI with five state-of-the-art LPI prediction methods through 5-fold crossvalidation, the results show that the GILPI prediction model has strong LPI classification performance. The case studies show that there may be interactions between NONHSAT021830 and Q9H9S0, n385685 and Q07955, and NONHSAT098243 and P25490.The novelty of GILPI is that it integrates the two similarities to construct a network, and then utilizes Graphlet interactions on the network to directly and indirectly link the features to mine out potential features, thus greatly improving the performance of the model.

Keywords
Graphlet interaction Jaccard similarity Pearson similarity lncRNA-protein interaction.
Introduction

1.1     Motivation:

 Long non-coding RNAs (lncRNAs) are transcripts composed of more than 200 nucleotides but lack coding capabilities [1]. lncRNAs play key roles in biological processes such as gene expression regulation, epigenetic regulation, and cell differentiation [2]. 

For example, HOXA-AS2 and SNHG12 in lncRNAs have been identified as potential therapeutic targets and biomarkers for human cancers [3]. DLEU1 is closely related to colorectal cancer through activation of KPNA3, the expression of HOTAIR is elevated in lung cancer, and ZFAS1 is closely related to the chemosensitivity of cervical cancer cells [4]. In summary, more and more experiments have confirmed that lncRNAs are tumor-related biomolecules. However, to date, the relationship between lncRNAs and known tumor suppressor entities remains largely elusive. There is evidence that lncRNAs exert their biological functions through binding to RNA-binding proteins. Therefore, identifying potential lncRNA-protein interactions (LPIs) contributes to understanding many important biological processes and the treatment of various complex diseases.

Conclusion

lncRNAs play a crucial role in many biological activities, such as gene transcription, translation and other processes. Not only that, lncRNAs also affect numerous diseases, so recognizing the lncRNA and protein interaction relationship can be a good grasp of the biological function of lncRNAs, which is important for the treatment of disease therapy, diagnosis and so on.

First, five datasets were collected; second, features of lncRNAs and proteins were extracted from the sequence data using pyfeat and BioTriangle, respectively. Third, these features were analyzed by Pearson's correlation coefficient to calculate the similarity between lncRNAs and the similarity between proteins. Fourth, the Jaccard similarity between lncRNAs and proteins was calculated based on the LPI network, and then the corresponding Pearson similarity and Jaccard similarity were averaged to construct the lncRNA-lncRNA similarity network and protein-protein similarity network. The experiment was repeated 10 times, and GILPI was compared with five state-of-the-art LPI prediction methods, namely, LPI-deepGBDT, LPI-DLDN, LPIEnANNDeep, LPI-EnEDT, and LPI-HyADBS, and the results showed that the GILPI prediction model had a strong LPI classification performance.The GILPI prediction model in the case study also achieved good results.

In future studies, we will first integrate various lncRNA and protein related datasets from different data sources. Secondly, mining the secondary and tertiary structures of proteins fused into lncRNA-protein pairs makes it possible to predict the relationship between a single lncRNA-protein pair. Then secondly, other nodes than the four nodes are considered in Graphlet interactions to make the acquired features more complete and rich. Finally, the computational efficiency is optimized by utilizing high-performance computing resources such as GPU acceleration and distributed computing to reduce the time of a single run, developing more efficient algorithms to handle large-scale datasets with less computational redundancy, and optimizing and automating the tuning of the model parameters to reduce the time needed to manually adjust the parameters.

Article Preview