Creating a complete workflow for digitising historical census documents : considerations and evaluation

Clausner, C ORCID: https://orcid.org/0000-0001-6041-1002, Hayes, J, Antonacopoulos, A ORCID: https://orcid.org/0000-0001-9552-0233 and Pletschacher, S ORCID: https://orcid.org/0000-0003-0541-0968 2017, Creating a complete workflow for digitising historical census documents : considerations and evaluation , in: 2017 Workshop on Historical Document Imaging and Processing (HIP2017), 10 November 2017, Kyoto, Japan.

[img]
Preview
PDF (Author's accepted manuscript) - Accepted Version
Download (1MB) | Preview

Abstract

The 1961 Census of England and Wales was the first UK census to make use of computers. However, only bound volumes and microfilm copies of printouts remain, locking a wealth of information in a form that is practically unusable for research. In this paper, we describe process of creating the digitisation workflow that was developed as part of a pilot study for the Office for National Statistics. The emphasis of the paper is on the issues originating from the historical nature of the material and how they were resolved. The steps described include image pre-processing, OCR setup, table recognition, post-processing, data ingestion, crowdsourcing, and quality assurance. Evaluation methods and results are presented for all steps.

Item Type: Conference or Workshop Item (Paper)
Schools: Schools > School of Computing, Science and Engineering
Journal or Publication Title: Proceedings of the 2017 Workshop on Historical Document Imaging and Processing (HIP2017)
Publisher: ACM Digital Library
Related URLs:
Funders: Office for National Statistics
Depositing User: Mr Christian Clausner
Date Deposited: 20 Nov 2017 14:25
Last Modified: 15 Feb 2022 22:40
URI: https://usir.salford.ac.uk/id/eprint/44371

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)

Downloads

Downloads per month over past year