设为首页 加入收藏

TOP

如何使用Spark SQL 的JDBC server
2018-12-06 17:26:47 】 浏览:96
Tags:如何 使用 Spark SQL JDBC server
摘要
如何使用Spark SQL 的JDBC server
简介


Spark SQL provides JDBC connectivity, which is useful for connecting business intelligence (BI) tools to a Spark cluster and for sharing a cluster across multipleusers. The JDBC server runs as a standalone Spark driver program that can be shared by multiple clients. Any client can cache tables in memory, query them, and so on and the cluster resources and cached data will be shared among all of them.


Spark SQL’s JDBC server corresponds to the HiveServer2 in Hive. It is also known as the “Thrift server” since it uses the Thrift communication protocol. Note that the JDBC server requires Spark be built with Hive support


运行环境


集群环境:CDH5.3.0


具体JAR版本如下:


spark版本:1.2.0-cdh5.3.0


hive版本:0.13.1-cdh5.3.0


hadoop版本:2.5.0-cdh5.3.0


启动 JDBC server


cd /etc/spark/conf
ln -s /etc/hive/conf/hive-site.xml hive-site.xml
cd /opt/cloudera/parcels/CDH/lib/spark/
chmod- -R 777 logs/
cd /opt/cloudera/parcels/CDH/lib/spark/sbin
./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008
Connecting to the JDBC server with Beeline


cd /opt/cloudera/parcels/CDH/lib/spark/bin
beeline -u jdbc:hive2://hadoop04:10000


[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000
scan complete in 2ms
Connecting to jdbc:hive2://hadoop04:10000
Connected to: Spark SQL (version 1.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
0: jdbc:hive2://hadoop04:10000>
Working with Beeline


Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.


CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';


create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'


#加载本地文件
load data local inpath '/external/tmp/data.txt' into table mytable


#加载hdfs文件
load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;


describe mytable;


explain select * from mytable where name = '张三'


select * from mytable where name = '张三'


cache table mytable


select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';

uncache table mytable
使用数据示例


张三#广州#学生
李四#贵州#教师
王五#武汉#讲师
赵六#成都#学生
lisa#广州#学生
lily#gz#studene
Standalone Spark SQL Shell


Spark SQL also supports a simple shell you can use as a single process: spark-sql


它主要用于本地的开发环境,在共享集群环境中,请使用JDBC SERVER


cd /opt/cloudera/parcels/CDH/lib/spark/bin
./spark-sql
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Spark 配置指南 下一篇spark-job-server

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目