A Method to Chinese-Vietnamese Bilingual Metallurgy Term Extraction Based on a Pivot Language

来源 :第六届中国计算机学会大数据学术会议 | 被引量 : 0次 | 上传用户:ironfeet
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  To settle resource scarcity problem for Chinese-Vietnamese bilingual aligned corpus in metallurgy field,a method to Chinese-Vietnamese bilingual term extraction in metallurgy field based on a pivot language is proposed.Firstly,term-unit and term-hood features are selected and inputted to the trained CRFs model to identify and extract Chinese metallurgy terminology.Secondly,the phrase-based statistical machine translation model is used to generate the Chinese-English phrase table and English-Vietnamese phrase table.With the pivot mapping idea,A Chinese-Vietnamese phrase table will be inferred out through pivot English.Finally,the former extracted Chinese metallurgy terms are used to filter the Chinese-Vietnamese phrase table,a Chinese-Vietnamese bilingual metallurgy term base,therefore,will be built.Experiments show that the proposed method achieved an accuracy rate at 69.45 percent.The method,under the resource absence of Chinese-Vietnamese bilingual alignment corpus,is validated as an effective solution to the difficult problem for Chinese-Vietnamese bilingual metallurgy term extraction.
其他文献
Community search plays an important role in complex network analysis.It aims to find a densely connected subgraph containing the query node in a graph.However,the most existing community search method
会议
As the number of scientific publication is getting larger and larger,scientific impact prediction has become an urgent need.However,traditional scientific impact prediction,which is mainly based on lo
社交网络中存在大量营销、招聘等垃圾信息以及无实质内容的短文,为话题建模工作带来很多干扰,更严重影响社交网络方面的学术研究及商业应用.因此,本文提出一种基于SVM-kNN模型的半监督话题噪声过滤方法.该方法融合了支持向量机(Support Vector Machines,SVM)和k近邻(k-Nearest Neighbor,k-NN)算法,在SVM计算得到超平面的基础上使用kNN算法在局部范围内迭
当前各类主流网络平台的发展呈现出“社交平台内容化、内容平台社交化”的趋势,用户分化也日趋明显,出现了拥有大规模粉丝的超级节点.内容和社交相结合、用户角色分化等异构(heterogeneous)化的特点使得传统社交网络分析方法遇到了挑战,针对这些特性,本文提出了一种基于社交关系的兴趣挖掘模型,结合矩阵分解和标签传播算法,将用户分为内容发布者和普通用户两类并分别提取和计算兴趣话题,实现了在大规模异构网
扩展置信规则库(Extended belief rule base,EBRB)在推理过程中需要遍历规则库中所有的无序规则,当规则库很大时EBRB系统的推理效率不高.鉴于此,本文提出使用局部敏感哈希(Locality Sensitive Hashing,LSH)算法对置信规则构建索引.首先用LSH算法为规则库中的所有规则生成特殊的局部敏感哈希值,该哈希值能尽量保持原始规则之间的相似度,因此相似的规则
In recent years,approximate nearest neighbor search methods based on hashing have received considerable attention in large-scale data.There are plenty of new algorithms have been created and applied t
A Bayesian network is a graphical model which analyzes probabilistic relationships among variables of interest.It has become a more and more popular and effective model for representing and inferring
现今的大部分网络信息系统均采用基于口令的用户身份鉴别方式,用户口令的安全性直接关系到个人信息的安全性.目前口令分析挖掘的研究主要针对英文使用习惯的口令,并且也局限在一些常见的单词或姓氏上.本文针对中文语境下,主要是古诗、成语在口令中的使用情况,基于口令字符串的数据分析技术,提出了一种基于已知口令元的中文语境口令分析方法.通过将识别出的已知口令元视作单个口令自由度,计算给定攻击成功率下的自由度攻击成
Erasure codes are widely advocated as a viable means to ensure the dependability of key-value storage systems for big data applications(e.g.,MapReduce).They separate user data to several data splits,e
网络流数据是分析人员对网络运行状况进行评判的重要依据,但网络流数据的数据量庞大、数据维度较多以及分析层次多样会给分析人员带来认知困难,因此针对网络流数据的多层次性、主机关联性以及多元时变性等特征,提出多层次关联可视分析模型.该模型针对多层次性和主机关联性特征分别设计了宏观-中观-微观的多层次分析以及关联分析,实现了由整体到局部、由局部到个体以及由点到面的可视分析.根据上述模型并结合网络流数据的多元