TY - JOUR
T1 - Prediction of protein-protein interactions based on protein-protein correlation using least squares regression
AU - Huang, De Shuang
AU - Zhang, Lei
AU - Han, Kyungsook
AU - Deng, Suping
AU - Yang, Kai
AU - Zhang, Hongbo
PY - 2014
Y1 - 2014
N2 - In order to transform protein sequences into the feature vectors, several works have been done, such as computing auto covariance (AC), conjoint triad (CT), local descriptor (LD), moran autocorrelation (MA), normalized moreaubroto autocorrelation (NMB) and so on. In this paper, we shall adopt these transformation methods to encode the proteins, respectively, where AC, CT, LD, MA and NMB are all represented by '+' in a unified manner. A new method, i.e. the combination of least squares regression with '+' (abbreviated as LSR+), will be introduced for encoding a protein-protein correlation-based feature representation and an interacting protein pair. Thus there are totally five different combinations for LSR+, i.e. LSRAC, LSRCT, LSRLD, LSRMA and LSRNMB. As a result, we combined a support vector machine (SVM) approach with LSR+ to predict protein-protein interactions (PPI) and PPI networks. The proposed method has been applied on four datasets, i.e. Saaccharomyces cerevisiae, Escherichia coli, Homo sapiens and Caenorhabditis elegans. The experimental results demonstrate that all LSR+methods outperform many existing representative algorithms. Therefore, LSR+is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction of PPI networks.
AB - In order to transform protein sequences into the feature vectors, several works have been done, such as computing auto covariance (AC), conjoint triad (CT), local descriptor (LD), moran autocorrelation (MA), normalized moreaubroto autocorrelation (NMB) and so on. In this paper, we shall adopt these transformation methods to encode the proteins, respectively, where AC, CT, LD, MA and NMB are all represented by '+' in a unified manner. A new method, i.e. the combination of least squares regression with '+' (abbreviated as LSR+), will be introduced for encoding a protein-protein correlation-based feature representation and an interacting protein pair. Thus there are totally five different combinations for LSR+, i.e. LSRAC, LSRCT, LSRLD, LSRMA and LSRNMB. As a result, we combined a support vector machine (SVM) approach with LSR+ to predict protein-protein interactions (PPI) and PPI networks. The proposed method has been applied on four datasets, i.e. Saaccharomyces cerevisiae, Escherichia coli, Homo sapiens and Caenorhabditis elegans. The experimental results demonstrate that all LSR+methods outperform many existing representative algorithms. Therefore, LSR+is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction of PPI networks.
KW - Least square regression
KW - Protein sequences
KW - Protein-protein correlation
KW - Protein-protein interactions
KW - SVM
UR - https://www.scopus.com/pages/publications/84906904242
U2 - 10.2174/1389203715666140724084019
DO - 10.2174/1389203715666140724084019
M3 - Article
C2 - 25059329
AN - SCOPUS:84906904242
SN - 1389-2037
VL - 15
SP - 553
EP - 560
JO - Current Protein and Peptide Science
JF - Current Protein and Peptide Science
IS - 6
ER -