Alex的Hadoop菜鸟教程:第10课Hive入门教程(四)

2015-02-03 11:44:36 · 作者: · 浏览: 79
EN Time taken: 0.727 seconds, Fetched: 5 row(s)
从4个桶中采样抽取一个桶的数据
hive> select * from b_student tablesample(bucket 1 out of 4 on id);
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1406097234796_0041, Tracking URL = http://hadoop01:8088/proxy/application_1406097234796_0041/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1406097234796_0041
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-12-08 17:35:56,995 Stage-1 map = 0%,  reduce = 0%
2014-12-08 17:36:06,783 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.9 sec
2014-12-08 17:36:07,845 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.9 sec
MapReduce Total cumulative CPU time: 2 seconds 900 msec
Ended Job = job_1406097234796_0041
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 2.9 sec   HDFS Read: 482 HDFS Write: 22 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 900 msec
OK
4	jolly	2014-09-10	CN

外部表

外部表就是存储不是由hive来存储的,比如可以依赖Hbase来存储,hive只是做一个映射而已。我用Hbase来举例
先建立一张Hbase表叫 employee
hbase(main):005:0> create 'employee','info'  
0 row(s) in 0.4740 seconds  
  
=> Hbase::Table - employee  
hbase(main):006:0> put 'employee',1,'info:id',1  
0 row(s) in 0.2080 seconds  
  
hbase(main):008:0> scan 'employee'  
ROW                                      COLUMN+CELL                                                                                                             
 1                                       column=info:id, timestamp=1417591291730, value=1                                                                        
1 row(s) in 0.0610 seconds  
  
hbase(main):009:0> put 'employee',1,'info:name','peter'  
0 row(s) in 0.0220 seconds  
  
hbase(main):010:0> scan 'employee'  
ROW                                      COLUMN+CELL                                                                                                             
 1                                       column=info:id, timestamp=1417591291730, value=1                                                                        
 1                                       column=info:name, timestamp=1417591321072, value=peter                                                                  
1 row(s) in 0.0450 seconds  
  
hbase(main):011:0> put 'employee',2,'info:id',2  
0 row(s) in 0.0370 seconds  
  
hbase(main):012:0> put 'employee',2,'info:name','paul'  
0 row(s) in 0.0180 seconds  
  
hbase(main):013:0> scan 'employee'  
ROW                                      COLUMN+CELL                                                                                                             
 1                                       column=info:id, timestamp=1417591291730, value=1                                                                        
 1                                       column=info:name, timestamp=1417591321072, value=peter                                                                  
 2                                       column=info:id, timestamp=1417591500179, value=2                                                                        
 2                                       column=info:name, timestamp=1417591512075, value=paul                                                                   
2 row(s) in 0.0440 seconds 

建立外部表进行映射
hive> CREATE EXTERNAL TABLE h_employee(key int, id int, name string)   
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, info:id,info:name")  
    > TBLPROPERTIES ("hbase.table.name" = "employee");  
OK  
Time taken: 0.324 seconds  
hive> select * from h_employee;  
OK  
1   1   peter  
2   2   paul  
Time taken: 1.129 seconds, Fetched: 2 row(s)

查询语法

具体语法可以参考官方手册https://cwiki.apache.org/confluence/display/Hive/Tutorial 我只说几个比较奇怪的点

显示条数

展示x条数据,用的还是limit,比如
hive> select * from h_employee limit 1
    > ;
OK
1	1	peter
Time taken: 0.284 seconds, Fetched: 1 row(s)
但是不支持起点,比如offset
下课!