Hive优化之自动合并输出的小文件

2015-07-16 12:08:00 · 作者: · 浏览: 1

1.先在hive-site.xml中设置小文件的标准.?



? hive.merge.smallfiles.avgsize
? 536870912
? When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files.? This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.


2.为只有map的mapreduce的输出并合并小文件.



? hive.merge.mapfiles
? true
? Merge small files at the end of a map-only job


3.为含有reduce的mapreduce的输出并合并小文件.



? hive.merge.mapredfiles
? true
? Merge small files at the end of a map-reduce job