A cloud-hosted MapReduce architecture for syntactic parsing

Woldemariam, YD, Pletschacher, S, Clausner, C ORCID: https://orcid.org/0000-0001-6041-1002 and Bass, JM ORCID: https://orcid.org/0000-0002-0570-7086 2019, A cloud-hosted MapReduce architecture for syntactic parsing , in: Euromicro Conference on Software Engineering and Advanced Applications.

[img]
Preview
PDF - Accepted Version
Download (688kB) | Preview

Abstract

Syntactic parsing is a time-consuming task innatural language processing particularlywherea largenumber of text files are beingprocessed. Parsingalgorithms are conventionally designed to operate on a single machine in a sequential fashionand, as a consequence, failto benefit from high performance and parallel computing resources available on the cloud.We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of asyntactic parser(constituency and dependency parsing)and a MapReduceframework running onclusters of machines.The resulting cloud-based MapReduce parsing is able to builda map where syntactic trees of the same input file have the same keyand collect into a singlefile containing sentences along with their corresponding trees.Ourexperimental evaluation showsthat the architecture scales wellwith regard to number or processing nodes and number of cores per node.In the fastest tested cloud-based setup, the proposed design performs 7times faster when compared to a localsetup. In summary, this study takes an important step toward providing and evaluating a cloud-hostedsolution for efficient syntactic parsingof natural language data sets consisting of a large number of files.

Item Type: Conference or Workshop Item (Paper)
Schools: Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre
Journal or Publication Title: EUROMICRO 45th Conference on Software Engineering and Advanced Applications (SEAA)
Related URLs:
Depositing User: Dr Julian M. Bass
Date Deposited: 03 Jul 2019 08:16
Last Modified: 03 Jul 2019 08:30
URI: http://usir.salford.ac.uk/id/eprint/51714

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)

Downloads

Downloads per month over past year