10.6084/M9.FIGSHARE.14955745.V1
Xin Liu
Xin
Liu
Liang Wang
Liang
Wang
Cheng-Hao Liang
Cheng-Hao
Liang
Ya-Ping Lu
Ya-Ping
Lu
Ting Yang
Ting
Yang
Xiao Zhang
Xiao
Zhang
An enhanced methodology for predicting protein-protein interactions between human and hepatitis C virus via ensemble learning algorithms
<p>Hepatitis C virus (HCV) is responsible for a variety of human life-threatening diseases, which include liver cirrhosis, chronic hepatitis, fibrosis and hepatocellular carcinoma (HCC) . Computational study of protein-protein interactions between human and HCV could boost the findings of antiviral drugs in HCV therapy and might optimize the treatment procedures for HCV infections. In this analysis, we constructed a prediction model for protein-protein interactions between HCV and human by incorporating the features generated by pseudo amino acid compositions, which were then carried out at two levels: categories and features. In brief, extra-tree was initially used for feature selection while SVM was then used to build the classification model. After that, the most suitable models for each category and each feature were selected by comparing with the three ensemble learning algorithms, that is, Random Forest, Adaboost, and Xgboost. According to our results, profile-based features were more suitable for building predictive models among the four categories. AUC value of the model constructed by Xgboost algorithm on independent data set could reach 92.66%. Moreover, Distance-based Residue, Physicochemical Distance Transformation and Profile-based Physicochemical Distance Transformation performed much better among the 17 features. AUC value of the Adaboost classifier constructed by Profile-based Physicochemical Distance Transformation on the independent dataset achieved 93.74%. Taken together, we proposed a better model with improved prediction capacity for protein-protein interactions between human and HCV in this study, which could provide practical reference for further experimental investigation into HCV-related diseases in future.</p> <p>Communicated by Ramaswamy H. Sarma</p>
Biochemistry
Chemical Sciences not elsewhere classified
Biological Sciences not elsewhere classified
Information Systems not elsewhere classified
Cancer
Infectious Diseases
Plant Biology
Taylor & Francis
2021
2021-07-12
2024-02-15
Dataset
265639 Bytes
10.6084/m9.figshare.14955745
10.1080/07391102.2021.1946429
CC BY 4.0