Clausner, C, Pletschacher, S and Antonacopoulos, A 2014, 'Document representation refinement for precise region description' , Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage , pp. 9-13.
- Published Version
Restricted to Repository staff only
Download (1MB) | Request a copy
Precise description of layout entities (content regions on a page) is crucial for all but the most trivial document analysis and recognition applications. The output of layout analysis methods and state-of-the-art OCR systems varies significantly, from bounding boxes (e.g. Tesseract) to stacks of text line rectangles (e.g. ABBYY FineReader). There is a clear need for a consistent and accurate representation of regions (e.g. text paragraphs, graphics entities etc.) for further processing, correction and performance evaluation (comparison of segmentation results with ground truth regions). This paper describes a method for refinement of document representations by fitting polygons around lower-level layout objects (such as text lines, words and glyphs) in a systematic way that reconstructs region outlines and preserves the fine details of complex layouts. Experimental results on a standard dataset demonstrate the validity and usefulness of the proposed approach.
|Schools:||Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre (SIRC)|
|Journal or Publication Title:||Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage|
|Publisher:||ACM New York|
|Depositing User:||Professor Apostolos Antonacopoulos|
|Date Deposited:||28 Jan 2015 11:21|
|Last Modified:||29 Oct 2015 00:10|
Actions (login required)
|Edit record (repository staff only)|