Addressing Domain Adaptation for Chinese Word Segmentation with Instances-based Transfer Learning

来源 :第十七届全国计算语言学学术会议暨第六届基于自然标注大数据的自然语言处理国际学术研讨会(CCL 2018) | 被引量 : 0次 | 上传用户:DNGOUSIYMMY
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Recent studies have shown effectiveness in using neural networks for Chinese Word Segmentation(CWS).However,these models,constrained by the domain and size of the training corpus,do not work well in domain adapta-tion.In this paper,we propose a novel instance-transferring method,which use valuable target domain annotated instances to improve CWS on different do-mains.Specifically,we introduce semantic similarity computation based on character-based n-gram embedding to select instances.Furthermore,training sentences similar to instances are used to help annotate instances.Experimental results show that our method can effectively boost cross-domain segmentation performance.We achieve state-of-the-art results on Internet literatures datasets,and competitive results to the best reported on micro-blog datasets.
其他文献
Nowadays,research on stylistic features(SF)mainly focuses on two aspects: lexical elements and syntactic structures.The lexical elements act as the content of a sentence and the syntactic structures c
Dialogue intent detection and semantic slot filling are two critical tasks in nature language understanding(NLU)for task-oriented dialog systems.In this paper,we present an attention-based encoder-dec
In recent years,mining opinions from customer reviews has been widely explored.Aspect-level sentiment analysis is a fine-grained subtask,which aims to detect the sentiment polarity towards a partic-ul
Network Representation Learning(NRL)can learn a latent space rep-resentation of each vertex in a topology network structure to reflect linked in-formation.Recently,NRL algorithms have been applied to
Network representation learning(NRL)aims at building a low-dimensional vector for each vertex in a network,which is also increasingly recognized as an important aspect for network analysis.Some curren
This paper studies the methods to improve end-to-end neural coreference resolution.First,we introduce a coreference cluster modification algorithm,which can help modify the coreference cluster to rule
Type information is very important in knowledge bases,but some large knowledge bases are lack of type information due to the incompleteness of knowledge bases.In this paper,we propose to use a well-de
It is common to fine-tune pre-trained word embeddings in text categorization.However,we find that fine-tuning does not guarantee improvement across text categorization datasets,while could introduce c
Named entity recognition(NER)in Chinese electronic medical records(EMRs)has become an important task of clinical natural language processing(NLP).However,limited studies have been performed on the cli
Keyphrase extraction can provide effective ways of organiz-ing scientific documents.For this task,neural-based methods usually suffer from performance unstability due to data scarcity.In this paper,we