TY - GEN
T1 - Prediction of RNA-Binding residues in proteins using the interaction propensities of amino acids and nucleotides
AU - Shrestha, Rojan
AU - Kim, Jisu
AU - Han, Kyungsook
PY - 2008
Y1 - 2008
N2 - Recently several machine learning approaches have been attempted to predict RNA-binding residues in amino acid sequences. None of these consider interacting partners (i.e., RNA) for a given protein when predicting RNA-binding amino acids, so they always predict the same RNA-binding residues for a given protein even if the protein may bind to different RNA molecules. In this study, we present a support vector machine (SVM) classifier that takes an RNA sequence as well as a protein sequence as input and predicts potential RNA-binding residues in the protein. The interaction propensity between an amino acid and nucleotide obtained from the extensive analysis of the representative protein-RNA complexes in the Protein Data Bank (PDB) was encoded in the feature vector of the SVM classifier. Four biochemical properties of an amino acid (the side chain pKa value, hydrophobicity index, molecular mass, and accessible surface area) were also encoded in the feature vector. On a dataset of 145 protein sequences and 78 RNA sequences, the SVM classifier achieved a sensitivity of 72.30% and specificity of 78.03%.
AB - Recently several machine learning approaches have been attempted to predict RNA-binding residues in amino acid sequences. None of these consider interacting partners (i.e., RNA) for a given protein when predicting RNA-binding amino acids, so they always predict the same RNA-binding residues for a given protein even if the protein may bind to different RNA molecules. In this study, we present a support vector machine (SVM) classifier that takes an RNA sequence as well as a protein sequence as input and predicts potential RNA-binding residues in the protein. The interaction propensity between an amino acid and nucleotide obtained from the extensive analysis of the representative protein-RNA complexes in the Protein Data Bank (PDB) was encoded in the feature vector of the SVM classifier. Four biochemical properties of an amino acid (the side chain pKa value, hydrophobicity index, molecular mass, and accessible surface area) were also encoded in the feature vector. On a dataset of 145 protein sequences and 78 RNA sequences, the SVM classifier achieved a sensitivity of 72.30% and specificity of 78.03%.
KW - Protein-RNA interactions
KW - RNA-binding residues
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=56549083198&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-87442-3_16
DO - 10.1007/978-3-540-87442-3_16
M3 - Conference contribution
AN - SCOPUS:56549083198
SN - 3540874402
SN - 9783540874409
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 114
EP - 121
BT - Advanced Intelligent Computing Theories and Applications
T2 - 4th International Conference on Intelligent Computing, ICIC 2008
Y2 - 15 September 2008 through 18 September 2008
ER -