Enhancing African low-resource languages: Swahili data for language modelling.

Researchers

Journal

Modalities

Models

Abstract

Language modelling using neural networks requires adequate data to guarantee quality word representation which is important for natural language processing (NLP) tasks. However, African languages, Swahili in particular, have been disadvantaged and most of them are classified as low resource languages because of inadequate data for NLP. In this article, we derive and contribute unannotated Swahili dataset, Swahili syllabic alphabet and Swahili word analogy dataset to address the need for language processing resources especially for low resource languages. Therefore, we derive the unannotated Swahili dataset by pre-processing raw Swahili data using a Python script, formulate the syllabic alphabet and develop the Swahili word analogy dataset based on an existing English dataset. We envisage that the datasets will not only support language models but also other NLP downstream tasks such as part-of-speech tagging, machine translation and sentiment analysis.
© 2020 The Authors. Published by Elsevier Inc.

Show Full Text

Enhancing African low-resource languages: Swahili data for language modelling.

Researchers

Journal

Modalities

Models

Abstract

Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations.

Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation.

Detection of Depression Severity Using Bengali Social Media Posts on Mental Health: Study Using Natural Language Processing Techniques.

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply