设为首页 加入收藏

TOP

hbase导出数据为文本,csv,html等文件
2019-01-19 13:45:01 】 浏览:86
Tags:hbase 导出 数据 文本 csv html 文件
版权声明:转载请注明出处 https://blog.csdn.net/seashouwang/article/details/79408944

需求:

根据时间范围、区域等条件查询,将hbase中终端采集数据最大时间、最小时间的日志数据导出

思路:

1、通过hbase自带导入导出将查询到的终端mac数据导出到指定目录

hbase org.apache.hadoop.hbase.mapreduce.Driver export 表名称 目录

例如:hbase org.apache.hadoop.hbase.mapreduce.Driver export LogTerminal /home/hbase

这样导出是整个表数据,没法根据条件过滤,并且导出数据为多个文件,处理不方便,姑且暂时不能使用。

2、通过查询条件过滤导出

scan 'LOG20180108',{COLUMNS => 'INFO',LIMIT=>1,FILTER=>"(PrefixFilter('T')) AND (SingleColumnValueFilter('INFO','AreaCode',=,'binary:610103'))"}

导出到文件:

echo "scan 'LOG20180108',{COLUMNS => ['INFO'],LIMIT=>1,FILTER=>\"(PrefixFilter('T')) AND (SingleColumnValueFilter('INFO','AreaCode',=,'binary:610103'))\"}" | hbase shell > myText.csv


3、通过hive导出

1、hbase表与hive临时表同步。

2、hive临时表数据导入到真实表

3、将真实表数据导入数据库

脚本如下:

#!/bin/bash

#获取输入时间,默认为当前时间

get_time(){

if [ 3 -eq $# ]

then

date=`date -d "+0 day $1" +%Y%m%d`

enddate=`date -d "+1 day $2" +%Y%m%d`

elif [ 0 -eq $# ]

then

echo "无输入参数,则默认构建昨天."

echo "若是批量构建,请输入时间段,格式为【$0 yyyy-mm-dd yyyy-mm-dd】."

#read -p "若不输入参数则默认构建昨天数据,输入【y】继续构建昨日数据,输入【n】退出:" isBuild

#case $isBuild in

#y | Y)

date=`date -d "+0 day yesterday" +%Y%m%d`

enddate=`date +%Y%m%d`

# ;;

#n | N)

# exit

# ;;

#*)

# echo "输入错误,退出"

# exit

# ;;

#esac

else

echo "输入有误."

echo "若是批量构建,请输入时间段,格式为【$0 yyyy-mm-dd yyyy-mm-dd】."

echo "若默认构建昨天数据,则不需要输入参数,直接执行【$0】."

fi

}

#创建存储数据表结构

hive_table(){

echo "create hive table start......."

hive -S -e "DROP TABLE IF EXISTS LogTerminal;

CREATE TABLE LogTerminal(rowkey string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string, SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string )COMMENT 'LogTerminal Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET; "

hive -S -e "DROP TABLE IF EXISTS LogTerminal_min;

CREATE TABLE LogTerminal_min(rowkey string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string, SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string )COMMENT 'LogTerminal Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET; "

hive -S -e "DROP TABLE IF EXISTS LogTerminal_max;

CREATE TABLE LogTerminal_max(rowkey string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string, SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string )COMMENT 'LogTerminal Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET; "

echo "create hive table end......."

}

#创建临时终端日志表结构,并将hbase表和hive临时表关联,然后根据条件查询临时表数据插入到真实表中

hive_task(){

echo "hive task $1 $2 $3 ..."

DATA_FORMAT=`date -d "$1" +%Y-%m-%d`

TABLE_NAME=LOG$1

AREA_CODE=$3

echo $DATA_FORMAT

echo $TABLE_NAME

echo $AREA_CODE

hive -S -e "DROP TABLE IF EXISTS TempLogTerminal;

CREATE EXTERNAL TABLE TempLogTerminal(key string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string,SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,INFO:TerminalID,INFO:TerminalMac,INFO:DeviceType,INFO:Power,INFO:Channel,INFO:MaxPower,INFO:TimeNear,INFO:LonNear,INFO:LatNear,INFO:RouteMac,INFO:SSID,INFO:SSIDs,INFO:SecurityType,INFO:RealType,INFO:RealCode,INFO:TerminalType,INFO:PcBrand,INFO:Phone,INFO:IMSI,INFO:IMEI,INFO:OS,INFO:CustomerStartTime,INFO:OffLineTime,INFO:Model,INFO:CoordinateX,INFO:CoordinateY,INFO:OffLineLon,INFO:OffLineLat,INFO:PcIP,INFO:RouteType,INFO:SessionID,INFO:ProtoType,INFO:CyberCode,INFO:IsMove,INFO:IsElectric,INFO:SafeState,INFO:GPIOState,INFO:Serial,INFO:GuildID,INFO:Time,INFO:ManufacturerCode,INFO:AreaCode,INFO:UnitCode,INFO:MachineCode,INFO:SystemType,INFO:DATASOURCEID,INFO:Lon,INFO:Lat,INFO:LatLon,INFO:RESOURCETYPE,INFO:AccessSystemID,INFO:InterfaceID,INFO:InterfaceGroupID,INFO:WriterTime') TBLPROPERTIES('hbase.table.name'='$TABLE_NAME') ;

INSERT $2 TABLE logterminal SELECT key as rowkey,TerminalID,TerminalMac,DeviceType,Power,Channel,MaxPower,TimeNear,LonNear,LatNear,RouteMac,SSID,SSIDs,SecurityType,RealType,RealCode,TerminalType,PcBrand, Phone,IMSI,IMEI,OS,CustomerStartTime,OffLineTime,Model,CoordinateX,CoordinateY,OffLineLon,OffLineLat,PcIP,RouteType,SessionID,ProtoType,CyberCode,IsMove,IsElectric, SafeState,GPIOState,Serial,GuildID,Time,ManufacturerCode,AreaCode,UnitCode,MachineCode,SystemType,DATASOURCEID,Lon,Lat,LatLon,RESOURCETYPE,AccessSystemID,InterfaceID,InterfaceGroupID,WriterTime

FROM TempLogTerminal WHERE RESOURCETYPE=32 AND AreaCode='$AREA_CODE';"

echo "hive task end......."

}

#创建sql表

sqlserver_table(){

"

if exists (select * from sysobjects where name='LogTerminal_min')

drop table LogTerminal_min

CREATE TABLE LogTerminal_min(

rowkey nvarchar(200),TerminalID bigint,TerminalMac nvarchar(200),DeviceType bigint,

Power bigint,Channel bigint,MaxPower bigint,

TimeNear nvarchar(200),LonNear float,LatNear float,

RouteMac nvarchar(200),SSID nvarchar(200),

SSIDs nvarchar(200),SecurityType nvarchar(200),RealType bigint,

RealCode nvarchar(200),TerminalType bigint,PcBrand nvarchar(200),

Phone nvarchar(200),IMSI nvarchar(200),IMEI nvarchar(200),

OS nvarchar(200),CustomerStartTime nvarchar(200),OffLineTime nvarchar(200),

Model nvarchar(200),CoordinateX nvarchar(200),CoordinateY nvarchar(200),

OffLineLon float,OffLineLat float,PcIP nvarchar(200),

RouteType bigint,SessionID nvarchar(200),ProtoType bigint,

CyberCode nvarchar(200),IsMove bigint,IsElectric bigint,

SafeState bigint,GPIOState bigint,Serial nvarchar(200),

GuildID nvarchar(200),Time nvarchar(200),ManufacturerCode nvarchar(200),

AreaCode nvarchar(200),UnitCode nvarchar(200),MachineCode nvarchar(200),

SystemType nvarchar(200),DATASOURCEID bigint,Lon float,

Lat float,LatLon nvarchar(200),RESOURCETYPE bigint,

AccessSystemID bigint,InterfaceID bigint,InterfaceGroupID bigint,

WriterTime nvarchar(200)

);

if exists (select * from sysobjects where name='LogTerminal_max')

drop table LogTerminal_max

CREATE TABLE LogTerminal_max(

rowkey nvarchar(200),TerminalID bigint,TerminalMac nvarchar(200),DeviceType bigint,

Power bigint,Channel bigint,MaxPower bigint,

TimeNear nvarchar(200),LonNear float,LatNear float,

RouteMac nvarchar(200),SSID nvarchar(200),

SSIDs nvarchar(200),SecurityType nvarchar(200),RealType bigint,

RealCode nvarchar(200),TerminalType bigint,PcBrand nvarchar(200),

Phone nvarchar(200),IMSI nvarchar(200),IMEI nvarchar(200),

OS nvarchar(200),CustomerStartTime nvarchar(200),OffLineTime nvarchar(200),

Model nvarchar(200),CoordinateX nvarchar(200),CoordinateY nvarchar(200),

OffLineLon float,OffLineLat float,PcIP nvarchar(200),

RouteType bigint,SessionID nvarchar(200),ProtoType bigint,

CyberCode nvarchar(200),IsMove bigint,IsElectric bigint,

SafeState bigint,GPIOState bigint,Serial nvarchar(200),

GuildID nvarchar(200),Time nvarchar(200),ManufacturerCode nvarchar(200),

AreaCode nvarchar(200),UnitCode nvarchar(200),MachineCode nvarchar(200),

SystemType nvarchar(200),DATASOURCEID bigint,Lon float,

Lat float,LatLon nvarchar(200),RESOURCETYPE bigint,

AccessSystemID bigint,InterfaceID bigint,InterfaceGroupID bigint,

WriterTime nvarchar(200)

);

"

}

#将数据导入到sqlserver中

import_sqlserver(){

echo " import sqlserver start......."

sqoop export -connect 'jdbc:sqlserver://192.168.2.219; username=nsmc53; password=123456;database=WFBDCMain' -table LogTerminal_min --hcatalog-database default --hcatalog-table LogTerminal_min --num-mappers 5;

sqoop export -connect 'jdbc:sqlserver://192.168.2.219; username=nsmc53; password=123456;database=WFBDCMain' -table LogTerminal_max --hcatalog-database default --hcatalog-table LogTerminal_max --num-mappers 5;

echo " import sqlserver end......."

}

#将hive数据导出到本地目录

export_hive_local(){

echo " export hive to local start......."

mkdir -p /home/hive/min

mkdir -p /home/hive/max

hive -S -e "\

insert overwrite local directory '/home/hive/min' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select a.* from LogTerminal a inner join(select TerminalMac,min(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;\

insert overwrite local directory '/home/hive/max' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select a.* from LogTerminal a inner join(select TerminalMac,max(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;"

#将hive数据导出到csv文件

hive -e " set hive.cli.print.header=true;select a.* from LogTerminal a inner join(select TerminalMac,min(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;" >> /home/hive/logterminal-min.csv

echo "export hive to local end......."

}

#将hive数据根据查询条件过滤导入到其他表

export_hive(){

echo " export hive to sqlserver start......."

#查询时间最小mac

hive -S -e "INSERT OVERWRITE TABLE LogTerminal_min select a.* from LogTerminal a inner join(select TerminalMac,min(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;";

#查询时间最大mac

hive -S -e "INSERT OVERWRITE TABLE LogTerminal_max select a.* from LogTerminal a inner join(select TerminalMac,max(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;";

echo "export hive to sqlserver end......."

}

main(){

#创建hive表LogTerminal

echo " create hive table LogTerminal......."

hive_table

#生成表结构时间范围

get_time $*

Style="OVERWRITE"

date1=$date

while [[ $date1 < $enddate ]]

do

echo "$date"

##创建临时终端日志表TempLogTerminal关联hbase表,重新运行时删除以前创建表结构,并将查询数据导入LogTerminal

echo " exec hive_task....... $date1 $Style $3 "

hive_task $date1 $Style $3

date1=`date -d "+1 day $date1" +%Y%m%d`

Style="INTO"

done

#从hive表LogTerminal日志表导出到本地目录/home/hive

echo " export hive to sqlserver ......."

export_hive

echo " import sqlserver ......."

import_sqlserver

echo " query logterminal end......."

}

main $*

通过执行以上脚本,将hbase数据导入到sqlserver中,通过sqlserver客户端工具连接查询数据,并根据客户端工具导出文本,csv,html等文件。

select * from WFBDCMain.dbo.LogTerminal_min

select * from WFBDCMain.dbo.LogTerminal_max


hive数据导出csv文件来源网络查询,做了部分修改如下:

#!/bin/bash

mkdir -p /tmp/project_1010/project

hive -e "use default;

insert overwrite local directory '/tmp/project_1010/t_test/'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

select t.* from LogTerminal t"

#生成表头,替换表头前的't_test.'字段,并写入最终的csv文件中

hive -e 'use default; SET hive.cli.print.header=true; SELECT * FROM LogTerminal LIMIT 0' | sed -e 's/\t/|/g;s/t_test\.//g' > /home/hive/logterminal-min.csv

#把000000_0,000001_0这样的文件通过追加的方式,写入最终的csv文件中

cat /tmp/project_1010/t_test/* >> /home/hive/logterminal-min.csv

#使用sed处理最终的csv文件,根据需求进行替换

sed -i 's/\\N/NULL/g' /home/hive/logterminal-min.csv

sed -i "s/|'|/|NULL|/g" /home/hive/logterminal-min.csv

sed -i 's/|"|/|NULL|/g' /tmp/project_1010/project/t_test.csv

测试hive直接导出数据时,由于hive导出csv文件不能分列等其他问题未得到解决,所以采用数据到方式操作。如果更好的方式请留言,大家相互促进学习。

脚本下载:

http://download.csdn.net/download/seashouwang/10264445


参考文档:

http://blog.csdn.net/qq_27078095/article/details/56482010

https://www.cnblogs.com/szw-blog/p/5512349.html

http://blog.csdn.net/javajxz008/article/details/61173213

http://www.cnblogs.com/husky/p/6422001.html

https://my.oschina.net/wangjiankui/blog/497658

https://www.2cto.com/net/201708/673854.html

http://www.cnblogs.com/husky/p/6764317.html


】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇hbase常用命令(扩展) 下一篇6.4 Impala和HBase进行整合,JDBC

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目