Semi-supervised Learning for Mongolian Morphological Segmentation

来源 :第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD | 被引量 : 0次 | 上传用户：a1lan

【摘要】

：

　　Unlike previous Mongolian morphological segmentation methods based on large labeled training data or complicated rules concluded by linguists,we explore a n

【作者】

：

ZhenxinYang[1]MiaoLi[2]LeiChen[2]WeihuiZeng[2]YiGao[3]ShaFu[3]

【机构】

：

Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei 230031,China

【出处】

：

第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD

【发表日期】

：

2016年期

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　Unlike previous Mongolian morphological segmentation methods based on large labeled training data or complicated rules concluded by linguists,we explore a novel semi-supervised method for a practical application,i.e.,statistical machine translation(SMT),based on a low-resource learning setting,in which a small amount of labeled data and large amount of unlabeled data are available.First,a CRF-based supervised learning is exploited to predict morpheme boundaries by using small labeled data.Then,a lexicon-based segmentation model with small labeled data as the heuristic information is used to compensate the weakness in the first step by the abundant unlabeled data.Finally,we present some error correction models to revise segmentation results.Experimental results show that our method can improve the segmentation results compared with the pure supervised learning.Besides,we integrate the morphological segmentation result into Chinese-Mongolian SMT and achieve the satisfactory performance compared with the baseline.

其他文献

Recognizing Biomedical Named Entities Based on the Sentence Vector/Twin Word Embeddings Conditioned

　　As a fundamental step in biomedical information extraction tasks,biomedical named entity recognition remains challenging.In recent years,the neural network

会议

Transition-based Chinese Semantic Dependency Graph Parsing

　　Chinese semantic dependency graph is extended from semantic dependency tree,which uses directed acyclic graphs to capture richer latent semantics of sentenc

会议

Definition Extraction with LSTM Recurrent Neural Networks

　　Definition extraction is the task to identify definitional sentences automatically from unstructured text.The task can be used in the aspects of ontology ge

会议

Keeping the Meanings of the Source Text:An introduction to Yes Translate

　　The primary task of language translation is to faithfully pass the meaning(s)of the source text to the target language.Unfortunately,meanings often get lost

会议

基于点关联测度矩阵分解的中英跨语言词嵌入

　　研究基于矩阵分解的词嵌入方法，提出统一的描述模型，并应用于中英跨语言词嵌入问题.以双语对齐语料为知识源，提出跨语言关联词计算方法和两种点关联测度的计算方法：跨语言共

会议

关联词测度矩阵分解语言词义跨语言相似度计算嵌入问题目标函数

基于问题与答案共同表示学习的半监督问题分类方法

　　问题分类旨在对问题的类型进行自动分类，该任务是问答系统研究的一项基本任务。本文提出了一种基于问题和答案共同表示学习的问题分类方法。该方法的特色在于，利用问题及其

会议

《世说新语》的篇章连接词

　　本文标注《世说新语》的篇章结构,据此研究其连接词的显隐、语义及用法.研究发现：1)隐式关系(3346,81.9％)多于显式关系(786,18.1％),17类关系仅有3类(假设,选择,让步)显多隐

会议

世说新语连接词类关系用法同义篇章结构个案分析多义

I Can Guess What You Mean:A Monolingual Query Enhancement for Machine Translation

　　We introduce a monolingual query method with additional webpage data to improve the translation quality for more and more official use requirement of statis

会议

Sentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences

　　The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in histor

会议

Chinese Hedge Scope Detection Based on Structure and Semantic Information

　　Hedge detection aims to distinguish factual and uncertain information,which is important in information extraction.The task of hedge detection contains two

会议

Semi-supervised Learning for Mongolian Morphological Segmentation

其他学术论文