The lifecycle of a digital historical document: structure and content
Antonacopoulos, A, Wiszniewski, B, Krawczyk, H and Karatzas, D 2004, The lifecycle of a digital historical document: structure and content , in: ACM Symposium on Document Engineering (DocEng'04), 28-30 October 2004, Milwaukee, Wisconsin, USA.
| PDF - Published Version Restricted to Repository staff only Download (504kB) | Request a copy |
Abstract
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic information) along with the tools that have been created to realise each stage in the lifecycle. The whole approach is described in the context of different types of typewritten documents relating to prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (€1.5M) by the European Union (www.memorialproject.info). Extensive tests with historians/archivists and evaluation of the content extraction results indicate the superior performance of the whole semantics-driven approach both over manual transcription and over the semi-automated application of off-the-shelf OCR and the use of a conventional (text and layout) document format.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Additional Information: | Publisher: ACM Press |
| Themes: | Subjects / Themes > Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources Subjects / Themes > Q Science > QA Mathematics > QA075 Electronic computers. Computer science Subjects outside of the University Themes |
| Schools: | Colleges and Schools > College of Science & Technology Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering > Data Mining and Pattern Recognition Research Centre |
| Journal or Publication Title: | Proceedings of the ACM Symposium on Document Engineering (DocEng'04), Milwaukee, USA, October 28-30, 2004, |
| Refereed: | Yes |
| Depositing User: | H Kenna |
| Date Deposited: | 05 Jan 2009 14:38 |
| Last Modified: | 10 Oct 2011 14:53 |
| URI: | http://usir.salford.ac.uk/id/eprint/887 |
Document Downloads
More statistics for this item...Actions (login required)
| Edit record (repository staff only) |

Tools
Tools