Coping with Problems of Unicoded Traditional Mongolian

来源 :第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD | 被引量 : 0次 | 上传用户：mylifefover12

【摘要】

：

　　Traditional Mongolian Unicode Encoding has serious problems as several pairs of vowels with the same glyphs but different pronunciations are coded different

【作者】

：

BoliWang[1]XiaodongShi[2]YidongChen[1]

【机构】

：

Department of Cognitive Science,Xiamen University,Xiamen,China

【出处】

：

第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD

【发表日期】

：

2016年期

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

　　Traditional Mongolian Unicode Encoding has serious problems as several pairs of vowels with the same glyphs but different pronunciations are coded differently.We expose the severity of the problem by examples from our Mongolian corpus and propose two ways to alleviate the problem: first,developing a publicly available Mongolian input method that can help users to choose the correct encoding and second,a normalization method to solve the data sparseness problems caused by the proliferation of homographs.Experiments in search engines and statistical machine translation show that our methods are effective.

其他文献

Active Learning for Age Regression in Social Media

　　Large-scale annotated corpora are a prerequisite for developing high-performance age regression models.However,such annotated corpora are some-times very ex

会议

Multilingual Multi-document Summarization with Enhanced hLDA Features

　　This paper presents the state of art research progress on multilingual multi-document summarization.Our method utilizes hLDA(hierarchical Latent Dirichlet A

会议

Combining Event-level and Cross-event Semantic Information for Event-Oriented Relation Classificatio

　　Previous researches on event relation classification primarily rely on lexical and syntactic features.In this paper,we use a Shallow Convolutional Neural Ne

会议

A New Focus Strategy for Efficient Dialog Management

　　The dialog manager is the most important component for a dialog system,in which the dialog state tracking is crucial to a real-world system.We claim that th

会议

A Novel Approach for Discovering Local Community Structure in Networks

　　The algorithms for discovering global community structure require the knowledge about entire network structures,which are still difficult and unrealistic to

会议

Investigation and use of methods for defining the extends of similarity of Kazakh language sentences

　　Finding similarity degree is one of the significant technologies used in the sample-based machine translation.It works in the following principle,first matc

会议

A Hierarchical LSTM Model for Joint Tasks

　　Previous work has shown that joint modeling of two Natural Language Processing(NLP)tasks are effective for achieving better performances for both tasks.Lots

会议

News Abridgement Algorithm Based on Word Alignment and Syntactic Parsing

　　The rapid development of new media results in a lot of redundant information,increasing the difficulty of quickly obtaining useful information and browsing

会议

Improved Joint Kazakh POS Tagging and Chunking

　　This paper describes a mixing model of joint POS tagging and chunking for Kazakh where partial optimal solution provide feature information for joint model.

会议

Improved Graph-based Dependency Parsing via Hierarchical LSTM Networks

　　In this paper,we propose a neural graph-based dependency parsing model which utilizes hierarchical LSTM networks on character level and word level to learn

会议

Coping with Problems of Unicoded Traditional Mongolian

其他学术论文