常用的hive sql函数总结 - Hive

TOP

常用的hive sql函数总结

2018-11-29 02:03:06 【大中小】浏览:472次

工作中常用到的hive sql函数，进行整理记录

1.nvl(x,y)   Returns y if x is null else return x

2. string A || string B || …(同concat函数)

3.T decode(条件,值1,返回值1,值2,返回值2,...值n,返回值n,缺省值)

4.int  INSTR((string,str[,start][,appear])   
返回string 中str 出现的位置。start 代表开始搜索的位置，可选参数，默认为1（1表示第一个字符位置）；appear 指示搜索第几次出现的，可选参数，默认为1；若没有匹配到，返回0。（ start和appear 取值为大于0的整数）

5.if(boolean testCondition, T valueTrue, T valueFalse)
Return valueTrue when testCondition is true, returns valueFalseOrNull otherwise

6. int  datediff(string enddate, string startdate)
Return the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2

7.mod（n1,n2) 
返回一个n1 除以n2 的余数。支持的类型有Byte、Short、Integer、Long、Float、Double。
返回值的正负和n1 相关。使用此函数需要注意
的2 个问题：
1、对小数执行mod 计算可能会产生精度丢失，（如mod(3.1415926535897384626,3.1415926535897384627，返回结果为0.0)
2、传入比MAX_LONG 还大的整数作为参数，则参数会被自动升级成Double 类型，函数也可以正常计算结果，但返回的结果是小数类型。

8. next_day()
计算给出日期date 之后的下一个星期day 的日期。day 是数字， 1-7分别表示星期日-六；返回日期的格式为“YYYY-MM-DD”

9.  to_number()
将给出的字符转换为数字；string 必须为全数字串。（ Oracle 中的to_number 很复杂，可变参数且支持多种类型。当前仅支持整数类型，包括short、int、long）

10. Stirng to_char(date,format)
将日期date 转化为一个字符串；date 的格式固定为yyyy-mm-dd hh24:mi:ss:ff3，输出的格式由format 指定。format 当前支持的格式如下（不区分大小写）：
yyyymmdd， 年月日；
yyyymm，年月；
mm，月
dd，日
yyyy-mm-dd
yyyy-mm
yyyymmddhh24miss，年月日时分秒（24小时制）
yyyy-mm-dd hh24:mi:ss
hh24miss
yyyymmddhh24missff3，年月日时分秒毫秒（24小时制）

11.last_day(date) 
Returns the last day of the month extracted from the provided date value argument. 
date格式为"yyyy-MM-dd"的字符串(开头为该形式即可)，返回的值是1到31之间的值，如果输入不正确则返回NULL。yyyy-MM-dd 是事先约定的输入格式

12.add_months( date, n )
Returns a date plus n months. 
date is the starting date (before the n months have been added). n is the number of months to add to date. 注：目前输入的date格式为"yyyy-MM-dd"或者"yyyyMMdd"（以这个格式开头的都可以接受，否则返回null，下面同上），返回字符串也是这个格式

13. cast(expr as <type>)   
Expected "=" to follow "type" 
Converts the results of the expression expr to <type>
cast('1' as BIGINT) will convert the string '1' to it integral representation. 
A null is returned if the conversion does not succeed.

14.date_add(string startdate, int days) 
Add a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'

15.date_sub(string startdate, int days)
Subtract a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'

16.concat(string|binary A, string|binary B...)
 e.g. concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings.
 
17.concat_ws(string SEP, string A, string B...)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
举例：
hive> select concat_ws(',','abc','def','gh') from lxw_dual;
abc,def,gh

18.parse_url(url, partToExtract[, key]) - extracts a part from a URL
解析URL字符串，partToExtract的选项包含[HOST,PATH,QUERY,REF,PROTOCOL,FILE,AUTHORITY,USERINFO]。
举例：
* parse_url('http://facebook.com/path/p1.phpquery=1', 'HOST')返回'facebook.com' 
* parse_url('http://facebook.com/path/p1.phpquery=1', 'PATH')返回'/path/p1.php' 
* parse_url('http://facebook.com/path/p1.phpquery=1', 'QUERY')返回'query=1'，
可以指定key来返回特定参数，例如
* parse_url('http://facebook.com/path/p1.phpquery=1', 'QUERY','query')返回'1'，

* parse_url('http://facebook.com/path/p1.phpquery=1#Ref', 'REF')返回'Ref' 
* parse_url('http://facebook.com/path/p1.phpquery=1#Ref', 'PROTOCOL')返回'http'

REGEXP_REPLACE

19.regexp_extract(str, regexp[, idx]) - extracts a group that matches regexp
字符串正则表达式解析函数。
-- 这个函数有点类似于 substring(str from 'regexp')  ..
参数解释:
其中：
str是被解析的字符串
regexp 是正则表达式
idx是返回结果 取表达式的哪一部分  默认值为1。
0表示把整个正则表达式对应的结果全部返回
1表示返回正则表达式中第一个() 对应的结果 以此类推 
注意点：
要注意的是idx的数字不能大于表达式中()的个数。
否则报错：
实例：
如：
select regexp_extract('x=a3&x=18abc&x=2&y=3&x=4','x=([0-9]+)([a-z]+)',0) from default.dual;
得到的结果为:
x=18abc
select regexp_extract('x=a3&x=18abc&x=2&y=3&x=4','x=([0-9]+)([a-z]+)',1) from default.dual;
得到的结果为:
18
select regexp_extract('x=a3&x=18abc&x=2&y=3&x=4','x=([0-9]+)([a-z]+)',2) from default.dual;
得到的结果为:
abc
我们当前的语句只有2个()表达式 所以当idx>=3的时候 就会报错
正则表达式解析函数：regexp_extract
语法: regexp_extract(string subject, string pattern, int index) 
返回值: string
说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。注意，在有些情况下要使用转义字符
举例：
hive> select regexp_extract(‘foothebar’, ‘foo(.*)(bar)’, 1) from dual;
the
hive> select regexp_extract(‘foothebar’, ‘foo(.*)(bar)’, 2) from dual;
bar
hive> select regexp_extract(‘foothebar’, ‘foo(.*)(bar)’, 0) from dual;
foothebar

19.
语法: regexp_replace(string A, string B, string C) 
返回值: string
说明：将字符串A中的符合Java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符
举例：
hive> select regexp_replace(‘foobar’, ‘oo|ar’, ”) from dual;
fb

20.
语法: substr(string A, int start, int len),substring(string A, int start, int len)  
返回值: string  
说明：返回字符串A从start位置开始，长度为len的字符串  
举例：  
hive> select substr(‘abcde’,3,2) from dual;  
cd  
hive>select substring(‘abcde’,-2,2) from dual;  
de  

21.
日期函数提示：FROM_UNIXTIME(UNIX_TIMESTAMP())声明与Oracle中的SYSDATE函数相同，动态返回Hive服务器的当前时间。
例： hive> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()) AS current_time FROM employee limit 1;
TO_DATE用于将获取的系统时间截取日期
例： hive> SELECT TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())) AS current_date FROM employee limit 

22.
解析和查找：LATERAL VIEW是用来生成用户自定义表以展开的形式显示map或array的值，如同EXPLODE()，但其会忽略值为NULL的列，如果要显示这些列，可以使用LATERAL VIEW OUTER(Hive0.12.0之后版本)
例： hive> INSERT INTO TABLE employee 
SELECT 'Steven' AS name, array(null) AS work_place, named_struct("sex","Male","age",30) AS sex_age, map("Python", 90) AS skills_score, map("R&D",array('Developer')) AS depart_title
FROM employee LIMIT 1; 
SELECT name, work_place, skills_score FROM employee; 

例： hive> SELECT name, workplace, skills, score FROM employee LATERAL VIEW explode(work_place) wp AS workplace LATERAL VIEW explode(skills_score) ss AS skills, score;

例： hive> SELECT name, workplace, skills, score FROM employee LATERAL VIEW OUTER explode(work_place) wp AS workplace LATERAL VIEW explode(skills_score) ss AS skills, 

例： hive> SELECT name, workplace, skills, score FROM employee LATERAL VIEW OUTER explode(work_place) wp AS workplace LATERAL VIEW explode(skills_score) ss AS skills, score;

日期相关：  
to_date：日期时间转日期函数  
select to_date('2015-04-02 13:34:12');  
输出：2015-04-02  
from_unixtime：转化unix时间戳到当前时区的时间格式  
select from_unixtime(1323308943,’yyyyMMdd’);  
输出：20111208  
unix_timestamp：获取当前unix时间戳  
select unix_timestamp();  
输出：1430816254  
select unix_timestamp('2015-04-30 13:51:20');  
输出：1430373080  
year：返回日期中的年  
select year('2015-04-02 11:32:12');  
输出：2015  
month：返回日期中的月份  
select month('2015-12-02 11:32:12');  
输出：12  
day：返回日期中的天  
select day('2015-04-13 11:32:12');  
输出：13  
hour：返回日期中的小时  
select hour('2015-04-13 11:32:12');  
输出：11  
minute：返回日期中的分钟  
select minute('2015-04-13 11:32:12');  
输出：32  
second：返回日期中的秒  
select second('2015-04-13 11:32:56');  
输出：56  
weekofyear：返回日期在当前周数  
select weekofyear('2015-05-05 12:11:1');  
输出：19  
datediff：返回开始日期减去结束日期的天数  
select datediff('2015-04-09','2015-04-01');  
输出：8  
date_sub：返回日期前n天的日期  
select date_sub('2015-04-09',4);  
输出：2015-04-05  
date_add：返回日期后n天的日期  
select date_add('2015-04-09',4);  
输出：2015-04-13


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：Hive简介、应用场景	下一篇：Hive中and和or的执行顺序