Pletschacher, S ORCID: https://orcid.org/0000-0003-0541-0968 and Antonacopoulos, A
ORCID: https://orcid.org/0000-0001-9552-0233
2010,
The PAGE (Page Analysis and Ground-Truth Elements) format framework
, in: 20th International Conference on Pattern Recognition (ICPR2010), 23rd-26th August 2010, Istanbul, Turkey.
![]()
|
PDF
- Published Version
Download (182kB) | Preview |
Abstract
There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Themes: | Subjects outside of the University Themes |
Schools: | Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre |
Journal or Publication Title: | Proceedings of the 20th International Conference on Pattern Recognition (ICPR2010), Istanbul, Turkey, August 23-26, 2010 |
Publisher: | IEEE-CS |
Refereed: | Yes |
ISBN: | 9781424475414 (ebook); 9781424475421 (print); 9780769541099 (CD) |
ISSN: | 1051-4651 |
Related URLs: | |
Depositing User: | Professor Apostolos Antonacopoulos |
Date Deposited: | 07 Oct 2011 10:06 |
Last Modified: | 15 Feb 2022 18:01 |
URI: | https://usir.salford.ac.uk/id/eprint/17827 |
Actions (login required)
![]() |
Edit record (repository staff only) |