Is automatic detection of hidden knowledge an anomaly?

Preiss, J ORCID: https://orcid.org/0000-0002-2158-5832 2019, 'Is automatic detection of hidden knowledge an anomaly?' , BMC Bioinformatics, 20 (Sup 10) , p. 251.

[img]
Preview
PDF - Published Version
Available under License Creative Commons Attribution 4.0.

Download (1MB) | Preview
[img] PDF - Accepted Version
Restricted to Repository staff only

Download (220kB)

Abstract

Background: The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches.

Results: Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction.

Conclusion: We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result.

Keywords: literature based discovery; anomaly detection; unified medical language system

Item Type: Article
Schools: Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre (SIRC)
Journal or Publication Title: BMC Bioinformatics
Publisher: BioMed Central
ISSN: 1471-2105
Related URLs:
Depositing User: USIR Admin
Date Deposited: 01 Feb 2019 12:42
Last Modified: 12 Jun 2019 08:00
URI: http://usir.salford.ac.uk/id/eprint/49923

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)

Downloads

Downloads per month over past year