[Hive]编写non-deterministic的UDF时遇到的谓词下推(Predicate pushdown)错误分析

2014-11-24 17:41:27 · 作者: · 浏览: 1

遇到这样一个问题:


hive> desc ljn001;


OK


name string


value int


hive> select * from ljn001;


OK


wang5 92


zhang3 87


li4 73


查询SQL如下:


select name


from


(select name,row_number(1) as rn


from


(select *


from ljn001 order by value


) a


) a


where rn = 1;


但是结果却意想不到:


OK


wang5


竟然把value最大的给取出来了!不筛选rn再看一下:


select name,rn


from


(select name,row_number(1) as rn


from


(select *


from ljn001 order by value


) a


) a;


OK


li4 1


zhang3 2


wang5 3


明明是li4对应的1,为什么筛选rn = 1却得到的是wang5 ?


看一下执行计划,豁然开朗了:


explain select name


from


(select name,row_number(1) as rn


from


(select *


from ljn001 order by value


) a


) a


where rn = 1;


OK


STAGE DEPENDENCIES:


Stage-1 is a root stage


Stage-0 is a root stage


STAGE PLANS:


Stage: Stage-1


Map Reduce


Alias -> Map Operator Tree:


a:a:ljn001


TableScan


alias: ljn001


Filter Operator


predicate:


expr: (row_number(1) = 1)


type: boolean


Select Operator


expressions:


expr: name


type: string


expr: value


type: int


outputColumnNames: _col0, _col1


Reduce Output Operator


key expressions:


expr: _col1


type: int


sort order: +


tag: -1


value expressions:


expr: _col0


type: string


expr: _col1


type: int


Reduce Operator Tree:


Extract


Select Operator


expressions:


expr: _col0


type: string


expr: row_number(1)


type: bigint


outputColumnNames: _col0, _col1


Filter Operator


predicate:


expr: (_col1 = 1)


type: boolean


Select Operator


expressions:


expr: _col0


type: string


outputColumnNames: _col0


File Output Operator


compressed: true


GlobalTableId: 0


table:


output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat


Stage: Stage-0


Fetch Operator


limit: -1


相关阅读