Skip to the content

A new framework for recognition of heavily degraded characters in historical typewritten documents based on semi-supervised clustering

Pletschacher, S, Hu, J and Antonacopoulos, A 2009, A new framework for recognition of heavily degraded characters in historical typewritten documents based on semi-supervised clustering , in: 10th International Conference on Document Analysis and Recognition, 26th - 29th July 2009, Barcelona.

[img] PDF - Published Version
Restricted to Repository staff only

Download (450kB) | Request a copy

    Abstract

    This paper presents a new semi-supervised clustering framework to the recognition of heavily degraded characters in historical typewritten documents, where off-theshelf OCR typically fails. The constraints are generated using typographical (collection-independent) domain knowledge and are used to guide both sample (glyph set) partitioning and metric learning. Experimental results using simple features provide encouraging evidence that this approach can lead to significantly improved clustering results compared to simple K-Means clustering, as well as to clustering using a state-of-the art OCR engine.

    Item Type: Conference or Workshop Item (Paper)
    Themes: Subjects outside of the University Themes
    Schools: Colleges and Schools > College of Health & Social Care > School of Nursing, Midwifery, Social Work & Social Sciences > Centre for Social Research
    Publisher: IEEE
    Refereed: Yes
    Depositing User: Users 29196 not found.
    Date Deposited: 21 Dec 2011 11:37
    Last Modified: 23 May 2014 14:59
    URI: http://usir.salford.ac.uk/id/eprint/19258

    Document Downloads

    More statistics for this item...

    Actions (login required)

    Edit record (repository staff only)