Hive中表的关联顺序对生成MapReduce作业数的影响案例(一)

2014-11-24 17:41:27 · 作者: · 浏览: 3

Hive中在做多表关联时,由于Hive的SQL优化引擎还不够强大,表的关联顺序不同往往导致产生不同数量的MapReduce作业数。这时就需要通过分析执行计划对SQL进行调整,以获得最少的MapReduce作业数。举一个例子(案例基于Hive 0.6.0):


create table ljn1(


k1 bigint,


k2 String,


v1 int


);



create table ljn2(


k1 bigint,


v2 int


);



create table ljn3(


k1 bigint,


v3 int


);



create table ljn4(


k1 bigint,


v4 int


);



create table ljn5(


k1 bigint,


v5 int


);



create table ljn6(


k2 string,


v6 int


);



然后看一下下面这个SQL的执行计划:


explain


select a.v1


from


ljn1 a


left outer join ljn2 b on (a.k1 = b.k1)


left outer join ljn3 c on (a.k1 = c.k1)


left outer join ljn4 d on (a.k1 = d.k1)


left outer join ljn6 e on (a.k2 = e.k2)


left outer join ljn5 f on (a.k1 = f.k1);




STAGE DEPENDENCIES:


Stage-5 is a root stage


Stage-1 depends on stages: Stage-5


Stage-2 depends on stages: Stage-1


Stage-0 is a root stage



STAGE PLANS:


Stage: Stage-5


Map Reduce


Alias -> Map Operator Tree:


a


TableScan


alias: a


Reduce Output Operator


key expressions:


expr: k1


type: bigint


sort order: +


Map-reduce partition columns:


expr: k1


type: bigint


tag: 0


value expressions:


expr: k1


type: bigint


expr: k2


type: string


expr: v1


type: int


b


TableScan


alias: b


Reduce Output Operator


key expressions:


expr: k1


type: bigint


sort order: +


Map-reduce partition columns:


expr: k1


type: bigint


tag: 1


Reduce Operator Tree:


Join Operator


condition map:


Left Outer Join0 to 1


condition expressions:


0 {VALUE._col0} {VALUE._col1} {VALUE._col2}


1


handleSkewJoin: false


outputColumnNames: _col0, _col1, _col2


File Output Operator


compressed: true


GlobalTableId: 0


table:


output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat



Stage: Stage-1


Map Reduce


Alias -> Map Operator Tree:


$INTNAME


Reduce Output Operator


key expressions:


expr: _col0


type: bigint


sort order: +


Map-reduce partition columns:


expr: _col0


type: bigint


tag: 0


value expressions:


expr: _col1


type: string


expr: _col2


type: int


c


TableScan


alias: c


Reduce Output Operator


key expressions:


expr: k1


type: bigint


sort order: +


Map-reduce partition columns:


expr: k1


type: bigint


tag: 1


d


TableScan


alias: d


Reduce Output Operator


key expressions:


expr: k1


type: bigint


sort order: +


Map-reduce partition columns:


expr: k1


type: bigint


tag: 2


f


TableScan


alias: f


Reduce Output Operator


key expressions:


expr: k1


type: bigint


sort order: +


Map-reduce partition columns:


expr: k1


type: bigint


tag: 3


Reduce Operator Tree:


Join Operator


condition map:


Left Outer Join0 to 1


Left Outer Join0 to 2


Left Outer Join0 to 3


condition expressions:


0 {VALUE._col3} {VALUE._col4}


1


2


3


handleSkewJoin: false


outputColumnNames: _col3, _col4


File Output Operator


compresse