Europeana newspapers OCR workflow evaluation

Pletschacher, S ORCID: https://orcid.org/0000-0003-0541-0968, Clausner, C ORCID: https://orcid.org/0000-0001-6041-1002 and Antonacopoulos, A ORCID: https://orcid.org/0000-0001-9552-0233 2015, Europeana newspapers OCR workflow evaluation , in: 2015 Workshop on Historical Document Imaging and Processing (HIP2015), August 2015, Nancy, France.

Full text not available from this repository. (Request a copy)

Abstract

This paper summarises the final performance evaluation results of the OCR workflow which was employed for large-scale production in the Europeana Newspapers project. It gives a detailed overview of how the involved software performed on a representative dataset of historical newspaper pages (for which ground truth was created) with regard to general text accuracy as well as layout-related factors which have an impact on how the material can be used in specific use scenarios. Specific types of errors are examined and evaluated in order to identify possible improvements related to the employed document image analysis and recognition methods. Moreover, alternatives to the standard production workflow are assessed to determine future directions and give advice on best practice related to OCR projects.

Item Type: Conference or Workshop Item (Paper)
Schools: Schools > School of Computing, Science and Engineering
Journal or Publication Title: Proceedings of the 2015 Workshop on Historical Document Imaging and Processing (HIP2015)
Publisher: ACM Digital Library
Related URLs:
Funders: EU Competitiveness and Innovation Framework Programme
Depositing User: S Pletschacher
Date Deposited: 23 Dec 2015 09:02
Last Modified: 27 Aug 2021 20:23
URI: https://usir.salford.ac.uk/id/eprint/37655

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)