Skip to main content

Buffering strategies in HDFS environment with STORM framework

19 November 2015

New Image

Hadoop Distributed File System (HDFS) is the de facto file system in vanilla Hadoop and most of open source Hadoop distributions. However HDFS suffers from the so called Small File Problem. Storm is a parallelized stream processing framework. In this paper we evaluate buffering strategies to mitigate the small file problem. In our evaluations Storm will be the small file source.