Skip to the content

Speeding up parsing of biological context-free grammars

Fredouille, D and Bryant, CH 2005, 'Speeding up parsing of biological context-free grammars' , in: Proceedings of the 16th Annual Symposium on Combinatorial pattern matching , Lecture notes in computer science (3537) , Springer, Berlin / Heidelberg, Germany, pp. 241-256.

[img]
Preview
PDF
Download (281kB) | Preview

    Abstract

    Grammars have been shown to be a very useful way to model biological sequences families. As both the quantity of biological sequences and the complexity of the biological grammars increase, generic and efficient methods for parsing are needed. We consider two parsers for context-free grammars: depth-first top-down parser and chart parser; we analyse and compare them, both theoretically and empirically, with respect to biological data. The theoretical comparison is based on a common feature of biological grammars: the gap - a gap is an element of the grammars designed to match any subsequence of the parsed string. The empirical comparison is based on grammars and sequences used by the bioinformatics community. Our conclusions are that: (1) the chart parsing algorithm is significantly faster than the depth-first top-down a lgorithm, (2) designing special treatments in the algorithms for managing gaps is useful, and (3) the way the grammar encodes gaps has to be carefully chosen, when using parsers not optimised for managing gaps, to prevent important increases in running times.

    Item Type: Book Section
    Editors: Apostolico, A, Crochemore, M and Park, K
    Additional Information: Paper originally presented at the 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22 2005
    Themes: Subjects / Themes > Q Science > QA Mathematics > QA075 Electronic computers. Computer science
    Subjects / Themes > Q Science > QH Natural history > QH301 Biology
    Subjects outside of the University Themes
    Schools: Colleges and Schools > College of Science & Technology
    Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering
    Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering > Data Mining and Pattern Recognition Research Centre
    Publisher: Springer
    Refereed: Yes
    Series Name: Lecture notes in computer science
    ISBN: 9783540262015
    Related URLs:
    Depositing User: Dr Chris H. Bryant
    Date Deposited: 16 Feb 2009 14:16
    Last Modified: 20 Aug 2013 16:55
    URI: http://usir.salford.ac.uk/id/eprint/1758

    Actions (login required)

    Edit record (repository staff only)

    Downloads per month over past year

    View more statistics