Skip to the content

Correction of arbitrary geometric artefacts in historical documents

Rahnemoonfar, M 2010, Correction of arbitrary geometric artefacts in historical documents , PhD thesis, Salford : University of Salford.

[img] PDF
Restricted to Repository staff only until 03 October 2014.

Download (66MB) | Request a copy

    Abstract

    The research presented in this thesis addresses the problem of correction of arbitrary geometric artefacts in historical documents. Geometric distortions in historical documents may be introduced at any time during the life cycle of a document, from when it was first printed to the time it is digitised by an imaging device. Such distortions appear as arbitrary warping, folds and page curl, and have detrimental effects to recognition (OCR) and readability (e.g. for print-on-demand). This thesis also critically examines the state of the art methods and identifies opportunities for significant improvement. Firstly, the present work focuses on the main issues in text line segmentation and proposes a method which is robust in the presence of various geometric distortions, other artefacts in historical documents, and dense and complex layout. Secondly, a precise base line detection method based on geometric features of the parametric model of the segmented line is presented. In other words, the proposed base line detection method not only takes into consideration unexpected geometric distortions, which are common in historical document images— but it also identifies certain main components of the text line, such as ascenders, descenders, and certain decorative marks, and makes intelligent distinctions between such native (but potentially misleading) components of the line and other global and local distortions of the whole page. Such precise derivation of the baselines (and in certain instances the top lines) will serve as building blocks for a major correction stage, namely the de-warping procedure. At its starting point, the proposed de-warping method takes into account both global and local characteristics of the text image and models the smooth deformations between text lines; by taking advantage of the proposed line segmentation and baseline detection stages, it can cope with a variety of distortions, such as page curl, arbitrary warping and fold, in a reliable, robust, and flexible manner.

    Item Type: Thesis (PhD)
    Contributors: Antonacopoulos, A(Supervisor)
    Additional Information:
    Schools: Colleges and Schools > College of Science & Technology
    Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering
    Depositing User: Institutional Repository
    Date Deposited: 03 Oct 2012 14:34
    Last Modified: 18 Feb 2014 10:04
    URI: http://usir.salford.ac.uk/id/eprint/26872

    Document Downloads

    More statistics for this item...

    Actions (login required)

    Edit record (repository staff only)