Skip to the content

Document representation refinement for precise region description

Clausner, C, Pletschacher, S and Antonacopoulos, A 2014, 'Document representation refinement for precise region description' , Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage , pp. 9-13.

[img] PDF - Published Version
Restricted to Repository staff only

Download (1MB) | Request a copy

Abstract

Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.

Item Type: Article
Schools: Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre (SIRC)
Journal or Publication Title: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage
Publisher: ACM New York
Refereed: Yes
Related URLs:
Funders: Other
Depositing User: Professor Apostolos Antonacopoulos
Date Deposited: 28 Jan 2015 11:21
Last Modified: 29 Oct 2015 00:10
URI: http://usir.salford.ac.uk/id/eprint/33525

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)

Downloads

Downloads per month over past year