Hive中在做多表关联时,由于Hive的SQL优化引擎还不够强大,表的关联顺序不同往往导致产生不同数量的MapReduce作业数。这时就需要通过分析执行计划对SQL进行调整,以获得最少的MapReduce作业数。举一个例子(案例基于Hive 0.6.0):
create table ljn1(
k1 bigint,
k2 String,
v1 int
);
create table ljn2(
k1 bigint,
v2 int
);
create table ljn3(
k1 bigint,
v3 int
);
create table ljn4(
k1 bigint,
v4 int
);
create table ljn5(
k1 bigint,
v5 int
);
create table ljn6(
k2 string,
v6 int
);
然后看一下下面这个SQL的执行计划:
explain
select a.v1
from
ljn1 a
left outer join ljn2 b on (a.k1 = b.k1)
left outer join ljn3 c on (a.k1 = c.k1)
left outer join ljn4 d on (a.k1 = d.k1)
left outer join ljn6 e on (a.k2 = e.k2)
left outer join ljn5 f on (a.k1 = f.k1);
STAGE DEPENDENCIES:
Stage-5 is a root stage
Stage-1 depends on stages: Stage-5
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-5
Map Reduce
Alias -> Map Operator Tree:
a
TableScan
alias: a
Reduce Output Operator
key expressions:
expr: k1
type: bigint
sort order: +
Map-reduce partition columns:
expr: k1
type: bigint
tag: 0
value expressions:
expr: k1
type: bigint
expr: k2
type: string
expr: v1
type: int
b
TableScan
alias: b
Reduce Output Operator
key expressions:
expr: k1
type: bigint
sort order: +
Map-reduce partition columns:
expr: k1
type: bigint
tag: 1
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
condition expressions:
0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
1
handleSkewJoin: false
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: true
GlobalTableId: 0
table:
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
$INTNAME
Reduce Output Operator
key expressions:
expr: _col0
type: bigint
sort order: +
Map-reduce partition columns:
expr: _col0
type: bigint
tag: 0
value expressions:
expr: _col1
type: string
expr: _col2
type: int
c
TableScan
alias: c
Reduce Output Operator
key expressions:
expr: k1
type: bigint
sort order: +
Map-reduce partition columns:
expr: k1
type: bigint
tag: 1
d
TableScan
alias: d
Reduce Output Operator
key expressions:
expr: k1
type: bigint
sort order: +
Map-reduce partition columns:
expr: k1
type: bigint
tag: 2
f
TableScan
alias: f
Reduce Output Operator
key expressions:
expr: k1
type: bigint
sort order: +
Map-reduce partition columns:
expr: k1
type: bigint
tag: 3
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
Left Outer Join0 to 3
condition expressions:
0 {VALUE._col3} {VALUE._col4}
1
2
3
handleSkewJoin: false
outputColumnNames: _col3, _col4
File Output Operator
compresse