论文部分内容阅读
《同义词词林》(下简称《词林》)中每个同义词集对应于一个唯一的义类代码。本文的基本假设是 :当这些词在文本中出现时 ,与它们前后同现的那些实词在统计意义上是相似的。初步实验表明 ,尽管根据分布的聚类过程与《词林》编者划分同义词集的机理完全不同 ,对词语进行聚类的结果却和《词林》语义类的平均一致率高达 80 %以上。本研究的意义在于 ,提出一种对语言学家凭语感对词的分类进行定量分析的方法 ,并且为计算机自动词义标注时使用语言学知识奠定了基础。
Each synonym set in “Synonyms Lin” (“Lin”) corresponds to a unique class code. The basic assumption in this paper is that when these words appear in the text, they are statistically similar to those words that come in front of and behind them. The preliminary experiments show that although the clustering process according to the distribution is completely different from that of the “Lin” editor, the average consistency rate between the words and the “Linlin” semantic class is as high as 80% or more. The significance of this study is to propose a method for quantifying the classification of words by linguists on the basis of verbal feeling, and laying a foundation for the use of linguistic knowledge in computer automatic semantic labeling.