设为首页 加入收藏

TOP

Hive解析带有数组的Json字符串
2018-12-06 01:07:44 】 浏览:349
Tags:Hive 解析 有数 Json 字符串

要处理的数据如下:

{
"IP": "192.168.1.1",
"appName": "sichuan_yunyingyong",
"customEvent": [
{
"eventName": "xx1",
"du": "xx",
"timestamp": "1480521763049",
"eventParams": {
"ContentID": "yixiuge",
"account": "13856976635",
"networkType": "WIFI",
"result": "0",
"type": "11"
}
},
{
"eventName": "xx2",
"du": "xx",
"timestamp": "1480521763049",
"eventParams": {
"ContentID": "yixiuge",
"account": "13856976636",
"networkType": "WIFI",
"result": "0",
"type": "11"
}
}
]
}

创建一个表tab_json,表中包含一个类型为String的字段json对应该字符串。
接下来要用到几个函数,先介绍一下lateral view,一般配合UDTF使用,能将一行变成多行输出。
JSON_TUPLE(str, path1, path2 ..., pathN)根据输入的参数提取字符串

posexplode(array)把数组变成(pos, json) 的键值对

select
j1.j1_ip,
j1.j1_appName,
j2.j2_customEvent_json
FROM tab_json s
lateral view json_tuple(s.json, 'IP', 'appName', 'customEvent') j1 as j1_ip, j1_appName, j1_customEvent

lateral view posexplode(split(regexp_replace(regexp_replace(j1.j1_customEvent,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')) j2 as j2_customEvents_pos, j2_customEvent_json

执行结果为:

192.168.1.1 sichuan_yunyingyong {"eventName":"xx1","du":"xx","timestamp":"1480521763049","eventParams":{"ContentID":"yixiuge","account":"13856976635","networkType":"WIFI","result":"0","type":"11"}}

192.168.1.1 sichuan_yunyingyong {"eventName":"xx2","du":"xx","timestamp":"1480521763049","eventParams":{"ContentID":"yixiuge","account":"13856976636","networkType":"WIFI","result":"0","type":"11"}}

这里把json array的格式通过替换变成了 {json1} || {json2} , 再去掉数组的括号,最后根据 || 来拆开,形成了一个有两个元素的数组,接着 posexplode 在把数组变成(pos, json) 的键值对,pos记录了元素的位置,json就是实际的json数据,这样一条数据就变成了两条了

那么现在要获取IP , appName , account 就很简单了:

select
j1.j1_ip,
j1.j1_appName,
j4.j4_account
FROM tab_json s
lateral view json_tuple(s.json, 'IP', 'appName', 'customEvent') j1 as j1_ip, j1_appName, j1_customEvent
lateral view posexplode(split(regexp_replace(regexp_replace(j1.j1_customEvent,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')) j2 as j2_customEvents_pos, j2_customEvent_json
lateral view json_tuple(j2.j2_customEvent_json, 'eventParams') j3 as j3_eventParams
lateral view json_tuple(j3.j3_eventParams, 'account') j4 as j4_account
结果如下:
192.168.1.1 sichuan_yunyingyong 13856976635

192.168.1.1 sichuan_yunyingyong 13856976636


上面的例子json array有两个元素,如果你只关注其中一个元素,那么可以如下操作
lateral view posexplode(array(split(regexp_replace(regexp_replace(j1.j1_customEvent,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')[1])) j2 as j2_customEvents_pos, j2_customEvent_json
split 跟上数组下标,就能取出某个元素,由于posexplode只接受 array类型的参数,可以使用array函数转换成对应的数组,这样就只有一条数据了,结果如下
192.168.1.1 sichuan_yunyingyong 13856976636

参考:https://blog.csdn.net/lfq1532632051/article/details/63262519


】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇hive 中转义符使用问题 下一篇使用嵌套子查询优化hive的SQL

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目