TY - JOUR
T1 - Predicting interactions between pathogen and human proteins based on the relation between sequence length and amino acid composition
AU - Alguwaizani, Saud
AU - Ren, Shulei
AU - Huang, De Shuang
AU - Han, Kyungsook
N1 - Publisher Copyright:
© 2021 Bentham Science Publishers.
PY - 2021
Y1 - 2021
N2 - Aim: Both bacterial infection and viral infection involve a large number of protein-protein interactions (PPIs) between a pathogen and its target host. Background: So far, many computational methods have focused on predicting PPIs within the same species rather than PPIs across different species. Methods: From the extensive analysis of PPIs between Yersinia pestis bacteria and humans, we recent-ly discovered an interesting relation; a linear relation between amino acid composition and sequence length was observed in many proteins involved in PPIs. We have built a support vector machine (SVM) model, which predicts PPIs between human and bacteria using two feature types derived from the rela-tion. The two feature types used in the SVM are the amino acid composition group (AACG) and the difference in amino acid composition between host and pathogen proteins. Results: The SVM model achieved high performance in predicting bacteria-human PPIs. The model showed an accuracy of 96%, sensitivity of 94%, and specificity of 98% in predicting PPIs between humans and Yersinia pestis, in which there is a strong relation between amino acid composition and sequence length. The SVM model was also tested in predicting PPIs between human and viruses, which include Ebola, HCV, and SARS-CoV-2, and showed a good performance. Conclusion: The feature types identified in our study are simple yet powerful in predicting pathogen-human PPIs. Although preliminary, our method will be useful for finding unknown target host proteins or pathogen proteins and designing in vitro or in vivo experiments.
AB - Aim: Both bacterial infection and viral infection involve a large number of protein-protein interactions (PPIs) between a pathogen and its target host. Background: So far, many computational methods have focused on predicting PPIs within the same species rather than PPIs across different species. Methods: From the extensive analysis of PPIs between Yersinia pestis bacteria and humans, we recent-ly discovered an interesting relation; a linear relation between amino acid composition and sequence length was observed in many proteins involved in PPIs. We have built a support vector machine (SVM) model, which predicts PPIs between human and bacteria using two feature types derived from the rela-tion. The two feature types used in the SVM are the amino acid composition group (AACG) and the difference in amino acid composition between host and pathogen proteins. Results: The SVM model achieved high performance in predicting bacteria-human PPIs. The model showed an accuracy of 96%, sensitivity of 94%, and specificity of 98% in predicting PPIs between humans and Yersinia pestis, in which there is a strong relation between amino acid composition and sequence length. The SVM model was also tested in predicting PPIs between human and viruses, which include Ebola, HCV, and SARS-CoV-2, and showed a good performance. Conclusion: The feature types identified in our study are simple yet powerful in predicting pathogen-human PPIs. Although preliminary, our method will be useful for finding unknown target host proteins or pathogen proteins and designing in vitro or in vivo experiments.
KW - Ebola
KW - HCV
KW - Machine learning
KW - Pathogen-host interaction
KW - Protein-protein interaction
KW - SARS-CoV-2
KW - Y. pestis
UR - http://www.scopus.com/inward/record.url?scp=85117249424&partnerID=8YFLogxK
U2 - 10.2174/1574893616666210430133846
DO - 10.2174/1574893616666210430133846
M3 - Article
AN - SCOPUS:85117249424
SN - 1574-8936
VL - 16
SP - 799
EP - 806
JO - Current Bioinformatics
JF - Current Bioinformatics
IS - 6
ER -