Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined with a Multi-Fold Learning Scheme

Qinhu Zhang, Dailun Wang, Kyungsook Han, De Shuang Huang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The rapid development of high-throughput sequencing technology provides unique opportunities for studying of transcription factor binding sites, but also brings new computational challenges. Recently, a series of discriminative motif discovery (DMD) methods have been proposed and offer promising solutions for addressing these challenges. However, because of the huge computation cost, most of them have to choose approximate schemes that either sacrifice the accuracy of motif representation or tune motif parameter indirectly. In this paper, we propose a bag-based classifier combined with a multi-fold learning scheme (BCMF) to discover motifs from ChIP-seq datasets. First, BCMF formulates input sequences as a labeled bag naturally. Then, a bag-based classifier, combining with a bag feature extracting strategy, is applied to construct the objective function, and a multi-fold learning scheme is used to solve it. Compared with the existing DMD tools, BCMF features three improvements: 1) Learning position weight matrix (PWM) directly in a continuous space; 2) Proposing to represent a positive bag with a feature fused by its k 'most positive' patterns. 3) Applying a more advanced learning scheme. The experimental results on 134 ChIP-seq datasets show that BCMF substantially outperforms existing DMD methods (including DREME, HOMER, XXmotif, motifRG, EDCOD and our previous work).

Original languageEnglish
Pages (from-to)1743-1751
Number of pages9
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume18
Issue number5
DOIs
StatePublished - 2021

Bibliographical note

Publisher Copyright:
© 2004-2012 IEEE.

Keywords

  • ChIP-seq
  • Discriminative motif discovery
  • bag-based classifier
  • multi-fold learning

Fingerprint

Dive into the research topics of 'Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined with a Multi-Fold Learning Scheme'. Together they form a unique fingerprint.

Cite this