Text, image and vector graphics based appraisal of contemporary documents

Sang Chul Lee, William McFadden, Peter Bajcsy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We have designed a framework for content based appraisal of documents. Our motivation is to provide computer assisted support for answering several appraisal criteria according to the general appraisal guidelines in the National Archives and Record Administration (NARA) 1441 directive. The appraisal criteria led us to investigations related to (a) finding groups of PDF documents with similar content, (b) ranking documents according to their creation/modification time and digital volume, and (c) detecting inconsistency between ranking and content within a group of related documents. The novelty of our work is in designing a methodology and a mathematical framework for document appraisals, and prototyping the framework working with text, image and vector graphics components of PDF documents. We present example results of grouping, ranking and integrity verification for groups of scientific documents about medical topics.

Original languageEnglish
Title of host publicationProceedings - 7th International Conference on Machine Learning and Applications, ICMLA 2008
Pages729-734
Number of pages6
DOIs
StatePublished - 2008
Event7th International Conference on Machine Learning and Applications, ICMLA 2008 - San Diego, CA, United States
Duration: 11 Dec 200813 Dec 2008

Publication series

NameProceedings - 7th International Conference on Machine Learning and Applications, ICMLA 2008

Conference

Conference7th International Conference on Machine Learning and Applications, ICMLA 2008
Country/TerritoryUnited States
CitySan Diego, CA
Period11/12/0813/12/08

Fingerprint

Dive into the research topics of 'Text, image and vector graphics based appraisal of contemporary documents'. Together they form a unique fingerprint.

Cite this