Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

来源 :Journal of Computer Science & Technology | 被引量 : 0次 | 上传用户:hunterring1
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
We present and analyze an unsupervised method for Word Sense Disambiguation(WSD).Our work is based on the method presented by McCarthy et al.in 2004 for finding the predominant sense of each word in the entire corpus.Their maximization algorithm allows weighted terms(similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense,i.e.,the sense with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word.This list is obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus.In the method of McCarthy et al.,every occurrence of the ambiguous word uses the same thesaurus,regardless of the context where the ambiguous word occurs.Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed similar words based on the syntactic context of the ambiguous word.We obtain a top precision of 77.54%of accuracy versus 67.10%of the original method tested on SemCor.We also analyze the effect of the number of weighted terms in the tasks of finding the Most Precuent Sense(MFS) and WSD,and experiment with several corpora for building the Word Space Model. We present and analyze an unsupervised method for Word Sense Disambiguation (WSD). Our work is based on the method presented by McCarthy et al. In 2004 for finding the predominant sense of each word in the entire corpus. The maximization algorithm allows weighted terms ( similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense, ie, the sense with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word.This list is obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus.In the method of McCarthy et al., every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where the ambiguous word occurs. Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed similar words based on the syntactic context of the ambiguous word. We obtain a top precision of 77.54% of accuracy ve rsus 67.10% of the original method tested on SemCor.We also analyze the effect of the number of weighted terms in the tasks of finding the Most Precuent Sense (MFS) and WSD, and experiment with several corpora for building the Word Space Model.
其他文献
心脑血管疾病是一种严重威胁人类健康的常见疾病,根据其病理变化分为出血性和缺血性脑血管病两大类。缺血性脑血管疾病的治疗包括病因治疗、药物治疗、神经介入治疗及干细胞移
轻信众多明星代言加盟受骗  张女士是湖北省黄石地区返乡农民工,去年10月,她从浙江一个工厂失业返乡后一直找不到工作,这时候,她发现,电视里天天都在播放着投资创业做老板的广告。  这些广告个个都说小投资可以赚大钱,这使张女士产生了创业的念头。  张女士选定了一个服装项目,但张女士对经营服装毕竟是个外行,她担心,万一进的服装卖不掉就会赔钱,看到她有些犹豫,对方给张女士一个定心丸,100%换货。  张女
目的:肝细胞生长因子(Hepatocyte Growth Factor,HGF)对异丙肾上腺素(Isoproterenol,Iso)诱导的大鼠心室肌细胞内钙离子浓度升高的保护作用  方法:①心室肌细胞是通过langendor
学位
学位
学位
学位
抗补体因子(cobra venom factor,CVF),是来源于眼镜蛇毒的抗补体酸性糖蛋白,能和B因子结合形成稳定的补体C3和C5转化酶,通过消耗C3和C5而发挥抗补体作用.鉴于CVF在蛇毒中含量
学位
幽 兰  柔条长百尺,峻谷绝飞禽。  蟾镜解相照,露华滋更侵。  青成君子佩,香折美人心。  遗立图画里,雪堂供楚吟。  注:柔条,指兰。潘岳《皇女诔》:“猗猗春兰,柔条含芳。”  步韵贺一得兄五十寿  胸存湖海气,道古愈能卑。  扶植遍新翠,交游多故知。  散原韬晦后,彭泽忘忧时。  坐看回天手,翻成日日诗。  注:陈散原离赣入江宁后,时将五十初度,绝意仕进,自编诗集亦起自此时。陶渊明五十岁作《
期刊