Skip to the content

Measuring performance when positives are rare: relative advantage versus predictive accuracy - a biological case-study

Muggleton, SH, Bryant, CH and Srinivasan, A 2000, 'Measuring performance when positives are rare: relative advantage versus predictive accuracy - a biological case-study' , in: Machine learning: ECML 2000: 11th European conference on machine learning, Barcelona, Catalonia, Spain, May 31-June 2 2000 , Lecture notes in computer science (1810) , Springer, Berlin / Heidelberg, Germany, pp. 300-312.

[img]
Preview
PDF
Download (211kB) | Preview

    Abstract

    This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Performance is measured using both predictive accuracy and a new cost function, em Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.

    Item Type: Book Section
    Editors: de Mántaras, RL and Plaza, E
    Additional Information: Paper originally presented at the 11th European Conference on Machine Learning Barcelona, Catalonia, Spain, May 31 – June 2, 2000 Proceedings.
    Uncontrolled Keywords: inductive logic programming
    Themes: Subjects / Themes > Q Science > QA Mathematics > QA075 Electronic computers. Computer science
    Subjects / Themes > Q Science > QH Natural history > QH301 Biology
    Subjects outside of the University Themes
    Schools: Colleges and Schools > College of Science & Technology
    Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering
    Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering > Data Mining and Pattern Recognition Research Centre
    Publisher: Springer
    Refereed: Yes
    ISBN: 9783540676027
    Related URLs:
    Depositing User: Dr Chris H. Bryant
    Date Deposited: 17 Feb 2009 12:16
    Last Modified: 20 Aug 2013 16:56
    URI: http://usir.salford.ac.uk/id/eprint/1764

    Actions (login required)

    Edit record (repository staff only)

    Downloads per month over past year

    View more statistics