Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection

来源 :Journal of Computer Science & Technology | 被引量 : 0次 | 上传用户:cqc465330937
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
An important component of a spoken term detection (STD) system involves estimating confidence measures of hypothesised detections.A potential problem of the widely used lattice-based confidence estimation,however,is that the confidence scores are treated uniformly for all search terms,regardless of how much they may differ in terms of phonetic or linguistic properties.This problem is particularly evident for out-of-vocabulary (OOV) terms which tend to exhibit high intra-term diversity.To address the impact of term diversity on confidence measures,we propose in this work a term-dependent normalisation technique which compensates for term diversity in confidence estimation.We first derive an evaluation-metric-oriented normalisation that optimises the evaluation metric by compensating for the diverse occurrence rates among terms,and then propose a linear bias compensation and a discriminative compensation to deal with the bias problem that is inherent in lattice-based confidence measurement and from which the Term Specific Threshold (TST) approach suffers.We tested the proposed technique on speech data from the multi-party meeting domain with two state-ofthe-art STD systems based on phonemes and words respectively.The experimental results demonstrate that the confidence normalisation approach leads to a significant performance improvement in STD,particularly for OOV terms with phonemebased systems. An important component of a spoken term detection (STD) system provides estimated confirmation measures hypothesised detections. A potential problem of the widely used lattice-based confidence estimation, however, is that the confidence scores are treated uniformly for all search terms, regardless how much they may differ in terms of phonetic or linguistic properties. This problem is particularly evident for out-of-vocabulary (OOV) terms which tend to exhibit high intra-term diversity. To address the impact of term diversity on confidence measures, we propose in this work a term-dependent normalization technique which compensates for term diversity in confidence estimation. We first derive an evaluation-metric-oriented normalization that optimises the evaluation metric by compensating for the various occurrence rates among terms, and then propose a linear bias compensation and a discriminative compensation to deal with the bias problem that is inherent in lattice-based confidence measure nt and from which the Term Specific Threshold (TST) approach suffers. We tested the proposed technique on speech data from the multi-party meeting domain with two state-of the-art STD systems based on phonemes and words respectively. the experimental results demonstrate that the confidence normalization approach leads to a significant performance improvement in STD, particularly for OOV terms with phonemebased systems.
其他文献
电阻应变式传感器是直接利用电阻应变片将应变转化为电阻变化的传感器,具有灵敏度高、稳定性好等优点,因此广泛应用于力矩、压力、加速度、重量等测量领域。一、电阻应变效应
目的:观察唑来膦酸联合放疗治疗恶性肿瘤骨转移性疼痛中的效果。方法:恶性肿瘤骨转移性疼痛患者88例随机分为观察组与对照组各44例。两组患者均接受放疗,在此基础上观察组加
患者,女,60岁,因精神障碍性疾病于2014月12月13日入院.入院体检:T 37.0℃,P 80次/min,R 19次/min,BP 95/60 mmHg.13日17:10给予患者阿普唑仑片(江苏恩华药业股份有限公司,批
期刊
@@
开车时间越长,驾驶者身体情况越差,所患相关疾病也越多近日某杂志对北京近11000名汽车驾驶者的健康状况进行了调查,结果令人惊讶:85%的被调查者患有脂肪肝、高血脂、颈椎病等
目的探讨针对Ⅱ期子宫颈癌患者采用子宫动脉灌注新辅助化疗的临床应用效果。方法选取本院2010年7月至2012年7月收治的63例Ⅱ期子宫颈癌患者作为研究组,所有患者均在手术与放
目的:探究肾移植受者ABCC2 1249G>A基因多态性是否与吗替麦考酚酯(MMF)所致相关不良反应有关.方法:对236例患者按不良反应类型分为骨髓抑制组、胃肠道反应组、感染组和对照组,
沙格列汀是一种二肽基肽酶-4(DPP-4)抑制药,达格列净为钠-葡萄糖协同转运蛋白2(SGLT2)抑制药。两药复方制剂已被美国FDA批准用于成人2型糖尿病的治疗。达格列净有增加内源性
期刊
@@
目的:分析浙江地区11家医院肺癌合并糖尿病患者口服降糖药的使用情况,为肺癌合并糖尿病患者此类药物的临床合理应用提供依据。方法:选取浙江地区11家医院肺癌合并糖尿病患者
目的:观察布拉酵母菌散联合康复新液治疗活动期轻中度溃疡性结肠炎的疗效。方法:102例活动期轻中度溃疡性结肠炎患者随机分为观察组与对照组各51例。对照组给予康复新液灌肠