1.先在hive-site.xml中设置小文件的标准.?
? hive.merge.smallfiles.avgsize
? 536870912
? When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files.? This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.
2.为只有map的mapreduce的输出并合并小文件.
? hive.merge.mapfiles
? true
? Merge small files at the end of a map-only job
3.为含有reduce的mapreduce的输出并合并小文件.
? hive.merge.mapredfiles
? true
? Merge small files at the end of a map-reduce job