Harvest Uyghur-Chinese Aligned-Sentences Bitexts from Multilingual Sites Based on Word Embedding

来源 :第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会 | 被引量 : 0次 | 上传用户：xuru69

【摘要】

：

【作者】

：

ShaoLin Zhu Xiao Li YaTing Yang Lei Wang ChengGang Mi

【机构】

：

University of Chinese Academy of Sciences,Beijing,China

【出处】

：

第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会

【发表日期】

：

2017年7期

【关键词】

：

bilingual parallel data word embedding resource-scarce languages

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　Obtaining bilingual parallel data from the multilingual websites is along-standing research problem,which is very benefit for resource-scarce lan-guages.In this paper,we present an approach for obtaining parallel data based on word embedding,and our model only rely on a small scale of bilingual lexi-con.Our approach benefit from the recent advances of continuous word repre-sentations,which can reveal more context information compared with tradition-al methods.Our experiments show that high-precision and sizable parallel Uy-ghur-Chinese data can be obtained for lacking bilingual lexicon.

其他文献

Integrating Word Sequences and Dependency Structures for Chemical-disease Relation Extraction

Understanding chemical-disease relations(CDR)from biomedicalliterature is important for biomedical research and chemical discovery.This pa-per uses a k-max pooling convolutional neural network(CNN)to

会议

CDR extractionCNNword sequencesdependency structures

Named Entity Recognition with Gated Convolutional Neural Networks

Most state-of-the-art models for named entity recognition(NER)rely on recurrent neural networks(RNNs),in particular long short-term memory(LSTM).Those models learn local and global fea-tures automatic

会议

Context Sensitive Word Deletion Model for Statistical Machine Translation

Word deletion(WD)errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation(SMT),and have a critical impact on the adequacy of

会议

natural language processingstatistical machine transla-tionword deletion

Bi-directional Gated Memory Networks for Answer Selection

Answer selection is a crucial subtask of the open domain question answering problem.In this paper,we introduce the Bi-directional Gated Memory Network(BGMN)to model the interactions between question a

会议

Question AnsweringAttention MechanismMemory Net-works

Employing Auto-annotated Data for Person Name Recognition in Judgment Documents

In the last decades,named entity recognition has been extensivelystudied with various supervised learning approaches depend on massive labeled data.In this paper,we focus on person name recognition in

会议

named entity recognitionauto-annotated dataLSTM

DIM Reader:Dual Interaction Model for Machine Comprehension

Enabling a computer to understand a document so that itcan answer comprehension questions is a central,yet unsolved goal of Natural Language Processing,so reading comprehension of text is an important

会议

machine comprehensionbi-directional attentiondual in-teraction modelCloze-sty

Generating Textual Entailment Using Residual LSTMs

Generating textual entailment(GTE)is a recently proposed task to study how to infer a sentence from a given premise.Current sequence-to-se-quence GTE models are prone to produce invalid sentences when

会议

Generating Textual EntailmentNatural Language GenerationNat-ural Language Proc

Multi-view LSTM Language Model with Word-synchronized Auxiliary Feature for LVCSR

Recently long short-term memory language model(LSTMLM)has received tremendous interests from both language and speech communities,due to its superiorty on modelling long-term dependency.Moreover,integ

会议

LSTM language modelspeech recognitionmulti-viewaux-iliary featuretagging mod

Tibetan Syllable-based Functional Chunk Boundary Identification

Tibetan syntactic functional chunk parsing is aimed at identifyingsyntactic constituents of Tibetan sentences.In this paper,based on the Tibetan syntactic functional chunk description system,we propos

会议

Tibetan Syntactic Functional ChunkChunk Boundary Recogni-tionSyllableSyntacti

Unsupervised Joint Entity Linking over Question Answering Pair with Global Knowledge

We consider the task of entity linking over question answering pair(QA-pair).In conventional approaches of entity linking,all the entities whether in one sentence or not are considered the same.We foc

会议

joint entity linkingquestion answering pairglobal knowledgein-tegral linear p

Harvest Uyghur-Chinese Aligned-Sentences Bitexts from Multilingual Sites Based on Word Embedding

其他学术论文