Clausner, C ORCID: https://orcid.org/0000-0001-6041-1002, Pletschacher, S
ORCID: https://orcid.org/0000-0003-0541-0968 and Antonacopoulos, Apostolos
ORCID: https://orcid.org/0000-0001-9552-0233
2016,
'Quality prediction system for large-scale digitisation workflows'
, Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016), 2016
, pp. 138-143.
![]() |
PDF
Restricted to Repository staff only Download (256kB) | Request a copy |
Abstract
The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.
Item Type: | Article |
---|---|
Schools: | Schools > School of Computing, Science and Engineering |
Journal or Publication Title: | Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016) |
Publisher: | IEEE |
Related URLs: | |
Funders: | European Commission |
Depositing User: | Professor Apostolos Antonacopoulos |
Date Deposited: | 22 Mar 2016 16:14 |
Last Modified: | 15 Feb 2022 20:32 |
URI: | http://usir.salford.ac.uk/id/eprint/38466 |
Actions (login required)
![]() |
Edit record (repository staff only) |