Relevance maximization for high-recall retrieval problem: finding all needles in a haystack

Justin Jong Su Song, Wookey Lee

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

High-recall retrieval problem, aiming at finding the full set of relevant documents in a huge result set by effective mining techniques, is particularly useful for patent information retrieval, legal document retrieval, medical document retrieval, market information retrieval, and literature review. The existing high-recall retrieval methods, however, have been far from satisfactory to retrieve all relevant documents due to not only high-recall and precision threshold measurements but also a sheer minimize the number of reviewed documents. To address this gap, we generalize the problem to a novel high-recall retrieval model, which can be represented as finding all needles in a giant haystack. To compute candidate groups consisting of k relevant documents efficiently, we propose dynamic diverse retrieval algorithms specialized for the patent-searching method, in which an effective dynamic interactive retrieval can be achieved. In the various types of datasets, the dynamic ranking method shows considerable improvements with respect to time and cost over the conventional static ranking approaches.

Original languageEnglish
Pages (from-to)7734-7757
Number of pages24
JournalJournal of Supercomputing
Volume76
Issue number10
DOIs
StatePublished - 1 Oct 2020

Bibliographical note

Publisher Copyright:
© 2017, Springer Science+Business Media New York.

Keywords

  • Diversity retrieval
  • High-recall retrieval problem
  • Patent retrieval

Fingerprint

Dive into the research topics of 'Relevance maximization for high-recall retrieval problem: finding all needles in a haystack'. Together they form a unique fingerprint.

Cite this