Buffering strategies in HDFS environment with STORM framework
19 November 2015
Hadoop Distributed File System (HDFS) is the de facto file system in vanilla Hadoop and most of open source Hadoop distributions. However HDFS suffers from the so called Small File Problem. Storm is a parallelized stream processing framework. In this paper we evaluate buffering strategies to mitigate the small file problem. In our evaluations Storm will be the small file source.