【摘 要】
:
Unlike previous Mongolian morphological segmentation methods based on large labeled training data or complicated rules concluded by linguists,we explore a n
【机 构】
:
Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei 230031,China
【出 处】
:
第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD
论文部分内容阅读
Unlike previous Mongolian morphological segmentation methods based on large labeled training data or complicated rules concluded by linguists,we explore a novel semi-supervised method for a practical application,i.e.,statistical machine translation(SMT),based on a low-resource learning setting,in which a small amount of labeled data and large amount of unlabeled data are available.First,a CRF-based supervised learning is exploited to predict morpheme boundaries by using small labeled data.Then,a lexicon-based segmentation model with small labeled data as the heuristic information is used to compensate the weakness in the first step by the abundant unlabeled data.Finally,we present some error correction models to revise segmentation results.Experimental results show that our method can improve the segmentation results compared with the pure supervised learning.Besides,we integrate the morphological segmentation result into Chinese-Mongolian SMT and achieve the satisfactory performance compared with the baseline.
其他文献
As a fundamental step in biomedical information extraction tasks,biomedical named entity recognition remains challenging.In recent years,the neural network
Chinese semantic dependency graph is extended from semantic dependency tree,which uses directed acyclic graphs to capture richer latent semantics of sentenc
Definition extraction is the task to identify definitional sentences automatically from unstructured text.The task can be used in the aspects of ontology ge
The primary task of language translation is to faithfully pass the meaning(s)of the source text to the target language.Unfortunately,meanings often get lost
研究基于矩阵分解的词嵌入方法,提出统一的描述模型,并应用于中英跨语言词嵌入问题.以双语对齐语料为知识源,提出跨语言关联词计算方法和两种点关联测度的计算方法:跨语言共
问题分类旨在对问题的类型进行自动分类,该任务是问答系统研究的一项基本任务。本文提出了一种基于问题和答案共同表示学习的问题分类方法。该方法的特色在于,利用问题及其
本文标注《世说新语》的篇章结构,据此研究其连接词的显隐、语义及用法.研究发现:1)隐式关系(3346,81.9%)多于显式关系(786,18.1%),17类关系仅有3类(假设,选择,让步)显多隐
We introduce a monolingual query method with additional webpage data to improve the translation quality for more and more official use requirement of statis
The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in histor
Hedge detection aims to distinguish factual and uncertain information,which is important in information extraction.The task of hedge detection contains two