Clausner, C ORCID: https://orcid.org/0000-0001-6041-1002, Pletschacher, S
ORCID: https://orcid.org/0000-0003-0541-0968 and Antonacopoulos, A
ORCID: https://orcid.org/0000-0001-9552-0233
2014,
Document representation refinement for precise region description
, in: DATeCH 2014: Digital Access to Textual Cultural Heritage 2014, 19th-20th May 2014, Madrid, Spain.
![]() |
PDF
- Published Version
Restricted to Repository staff only Download (1MB) | Request a copy |
Abstract
Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Schools: | Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre |
Journal or Publication Title: | Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage |
Publisher: | ACM Digital Library |
Refereed: | Yes |
ISBN: | 9781450325882 |
Related URLs: | |
Funders: | European Union |
Depositing User: | Professor Apostolos Antonacopoulos |
Date Deposited: | 28 Jan 2015 11:21 |
Last Modified: | 15 Feb 2022 18:57 |
URI: | https://usir.salford.ac.uk/id/eprint/33525 |
Actions (login required)
![]() |
Edit record (repository staff only) |