Document representation refinement for precise region description

Clausner, C ORCID:, Pletschacher, S ORCID: and Antonacopoulos, A ORCID: 2014, Document representation refinement for precise region description , in: DATeCH 2014: Digital Access to Textual Cultural Heritage 2014, 19th-20th May 2014, Madrid, Spain.

[img] PDF - Published Version
Restricted to Repository staff only

Download (1MB) | Request a copy


Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.

Item Type: Conference or Workshop Item (Paper)
Schools: Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre
Journal or Publication Title: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage
Publisher: ACM Digital Library
Refereed: Yes
ISBN: 9781450325882
Related URLs:
Funders: European Union
Depositing User: Professor Apostolos Antonacopoulos
Date Deposited: 28 Jan 2015 11:21
Last Modified: 15 Feb 2022 18:57

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)