Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine T

来源 :清华大学学报自然科学版（英文版） | 被引量 : 0次 | 上传用户：tgw

【摘要】

：

【作者】

：

Mieradilijiang Maimaiti Yang Liu Huanbo Luan Maosong Sun

【机构】

：

Institute for Artificial Intelligence,Beijing National Research Center for Information Science and T

【出处】

：

清华大学学报自然科学版（英文版）

【发表日期】

：

2022年1期

【关键词】

：

artificial intelligence natural language processing neural network machine trans

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Most State-Of-The-Art (SOTA) Neural Machine Translation (NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages (LRLs).In the low-resource NMT paradigm,Transfer Learning (TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning (HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages (HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek (Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4.94 and 4.84 BLEU points for low-resource child languages Az → Zh and Uz → Zh,respectively.

其他文献

Event Temporal Relation Extraction with Attention Mechanism and Graph Neural Network

Event temporal relation extraction is an important part of natural language processing.Many models are being used in this task with the development of deep learning.However,most of the existing methods cannot accurately obtain the degree of association be

期刊

temporal relation extractionneural networkattention mechanismgraph attention

A Dynamic and Deadline-Oriented Road Pricing Mechanism for Urban Traffic Management

Road pricing is an urban traffic management mechanism to reduce traffic congestion.Currently,most of the road pricing systems based on predefined charging tolls fail to consider the dynamics of urban traffic flows and travelers\' demands on the arrival

期刊

road pricingtraffic congestion alleviationdeep reinforcement learning

Mutation Testing for Integer Overflow in Ethereum Smart Contracts

Integer overflow is a common vulnerability in Ethereum Smart Contracts (ESCs) and often causes huge economic losses.Smart contracts cannot be changed once it is deployed on the blockchain and thus demand further testing.Mutation testing is a fault-based t

期刊

blockchainEthereum Smart Contracts (ESCs)integer overflowmutation testing

Two-Stage Lesion Detection Approach Based on Dimension-Decomposition and 3D Context

Lesion detection in Computed Tomography (CT) images is a challenging task in the field of computer-aided diagnosis.An important issue is to locate the area of lesion accurately.As a branch of Convolutional Neural Networks (CNNs),3D Context-Enhanced (3DCE)

期刊

lesion detectionComputed Tomography (CT)dimension-decomposition3D contextcom

Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs

In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on

期刊

multiple Graphics Processing Units (GPUs)batch sizetraining errordistributed

Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest

Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases,which has great significance in diagnosing and treating diseases.However,traditional biometric methods are time consuming and expensive.Accor

期刊

DeepWalkrandom forestmetabolite-disease associationsmolecular fingerprint sim

Sensitivity of N400 Effect During Speech Comprehension Under the Uni-and Bi-Modality Conditions

N400 is an objective electrophysiological index in semantic processing for brain.This study focuses on the sensitivity of N400 effect during speech comprehension under the uni-and bi-modality conditions.Varying the Signal-to-Noise Ratio (SNR) of speech si

期刊

audio-visual speechauditory noiseaudio-visual integrationSignal-to-Noise Rati

IDEA:A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation methods does not consider missing values.A

期刊

anonymizationgeneralizationincomplete data streamsprivacy preservationutilit

CAN:Effective Cross Features by Global Attention Mechanism and Neural Network for Ad Click Predictio

Online advertising click-through rate (CTR) prediction is aimed at predicting the probability of a user clicking an ad,and it has undergone considerable development in recent years.One of the hot topics in this area is the construction of feature interact

期刊

click-through rate predictionglobal attention mechanismfeature interactionneu

SIGNGD with Error Feedback Meets Lazily Aggregated Technique:Communication-Efficient Algorithms for

The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems.However,the communication overhead is a major bottleneck that hampers the scalability of distributed machine

期刊

distributed learningcommunication-efficient algorithmconvergence analysis

Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine T

其他学术论文