论文部分内容阅读
在对DVD的内容进行管理和检索时,需要针对字幕文本进行处理,因此需要对DVD字幕流图片中的文本进行提取和识别,以获得纯文本数据。本文通过对DVD文件结构和字幕流数据存储结构的研究和分析,描述了字幕、私有流1和VOB文件的结构以及访问技术;针对英文字幕,提出了一种DVD字幕流分离、图片提取与解码、字符分割、样本训练以及文本识别的基本方法。通过简单的数据训练,就可使用该方法快速完成DVD英文字幕文本的自动生成。
When managing and retrieving the contents of a DVD, the subtitle text needs to be processed. Therefore, the text in the DVD subtitle stream image needs to be extracted and identified to obtain plain text data. This paper describes the structure of the subtitle, private stream 1 and VOB file and the accessing technology by researching and analyzing the structure of the DVD file and the data storage structure of the subtitle stream. In view of the English subtitle, a DVD subtitle stream separation, picture extraction and decoding , Character segmentation, sample training and text recognition of the basic methods. Through simple data training, you can use this method to quickly complete DVD English subtitle text is automatically generated.