Self-Supervised Deep Learning Models for Low-Resource NLP Applications

Pardeep Kaur

doi:10.71143/5nh5zh32

Authors

Pardeep Kaur Assistant Professor, Faculty of BCA, Dunes College, Kutch, Gandhi Dham, Gujarat, India

DOI:

https://doi.org/10.71143/5nh5zh32

Abstract

The availability of huge annotated datasets and deep learning have led to astounding Natural Language Processing (NLP) progress. However, there is a plethora of low-resource languages, lack the labelled corpora to train the models in the supervision. This kind of digital divide does not allow fair NLP development across language groups. Self-supervised learning (SSL) is a new paradigm in recent years that leverages large amounts of untagged text to learn powerful representations with little or no manual annotation. This paper is a review of self-supervised deep learning models in low-resource NLP tasks. It begins with the definition of the principles of the SSL and the difference between this approach and supervised and unsupervised approaches. We describe why methods such as masked language modeling, contrastive learning, and autoregressive modeling underlie modern pre-trained transformers such as BERT, GPT, and mBERT. A particular interest is paid to multilingual and cross lingual SSL schemes which allow knowledge transfer between high-resource and low resource languages. Low-resource tasks, including machine translation, sentiment analysis, speech-to-text, and information retrieval are reviewed. As per benchmark studies, even in small labelled samples, the SSL models can achieve big gains in accuracy and generalization. Computation cost, representational bias and morphologically rich and under-documented language evaluation, are however, problematic. The researchers observe that the study suffers certain limitations, such as over-reliance on high-resource pretraining information, inequity between linguistic groups, and the difficulty of deploying large-scale SSL models in resource-constrained circumstances. The future directions include lightweight multilingual models, federation of learning in NLP and symbolic linguistic knowledge and combination with SSL. Self-supervised deep learning bridges an essential gap between the high-resource and low-resource languages, and would be a highly valuable step toward inclusive, global NLP innovation.

Downloads

Download data is not yet available.

Self-Supervised Deep Learning Models for Low-Resource NLP Applications

Authors

DOI:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Issn Block

Impact Factor

Ugc Guidelines

Indexing

Make a Submission

publisher

Policies

LOGIN FORM

Aims & scope

Information

Latest publications

Visitor-counter