论文部分内容阅读
HDFS是目前最典型的云存储平台,它凭借其高容错、可伸缩和廉价存储的优点支持大规模数据集的存储.但是HDFS对于海量、高并发、连续、高速的小文件的接收和存储效率并不高.针对这一问题,提出一种优化方案RSMSF.在该方法中,文件缓存服务器不断地接收前端文件,给文件添加标识信息并存放到对应的文件队列.当文件队列满足某一窗口阈值时,根据一致性哈希算法将该队列中的文件发送到对应的文件处理服务器上进行文件合并处理,最后上传到HDFS.实验表明,RSMSF方法减少了文件的处理时间,降低了文件丢失率,同时降低了HDFS中内存的开销,节约了存储空间.
HDFS is by far the most typical cloud storage platform that supports the storage of large datasets with the benefits of high fault tolerance, scalability, and cheap storage, but HDFS’s ability to receive and store large, high-concurrency, continuous, high-speed small files Is not high.Aiming at this problem, an optimization scheme RSMSF is proposed.In this method, the file cache server continuously receives the front-end file, adds the identification information to the file and stores it in the corresponding file queue.When the file queue satisfies a certain window Threshold, the file in the queue is sent to the corresponding file processing server for file merging according to a consistent hashing algorithm, and finally the file is uploaded to HDFS.The experiment shows that the RSMSF method reduces the processing time of the file and reduces the file loss rate , While reducing the HDFS memory overhead, saving storage space.