设为首页 加入收藏

TOP

Hive On Spark搭建(cdh)
2019-03-19 12:48:51 】 浏览:80
Tags:Hive Spark 搭建 cdh

hive 和 spark版本之前有强对应关系

apache hive 和 spark 对应关系表
master 2.3.0
3.0.x 2.3.0
2.3.x 2.0.0
2.2.x 1.6.0
2.1.x 1.6.0
2.0.x 1.5.0
1.2.x 1.3.1
1.1.x 1.2.0
cdh hive 和 spark对应关系

http://archive.cloudera.com/cdh5/cdh/5/

编译环境准备

下载scala 2.11版本

下载地址

4707718-f9b7136bd8bf237a.png

# 添加环境变量
vim /etc/profile
export SCALA_HOME=/root/scala-2.11.12
export PATH=$PATH:$SCALA_HOME/bin
下载maven (3.3 版本以上)

下载地址

4707718-cd258b85e9867ab7.png

# 添加环境变量
vim /etc/profile
export MAVEN_HOME=/root/apache-maven-3.5.3
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m -XX:MaxPermSize=2014M"
export PATH=$PATH:$MAVEN_HOME/bin

source /etc/profile

下载源码,进行编译

下载地址

查看hadoop version
4707718-9b7b3323fed4b567.png
指定hadoop version 编译
# 确定版本 + 不编译hive包
./make-distribution.sh --name hadoop2-without-hive --tgz -Pyarn -Phadoop-provided -Phadoop-2.6 -Porc-provided -Dhadoop.version=2.6.0-cdh5.14.2

编译生成spark-1.6.0-bin-hadoop2-without-hive.tgz
解压spark-1.6.0-bin-hadoop2-without-hive.tgz 到目录(eg. /root/spark-1.6.0-bin-hadoop2-without-hive)

添加spark配置文件
  • spark hdfs
sudo -u hdfs hdfs dfs -mkdir -p /spark/jars
sudo -u hdfs hdfs dfs -mkdir -p /spark/log/envent-log
# 上传 jar包
hdfs dfs -put /root/spark-1.6.0-bin-hadoop2-without-hive/lib/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar /user/root
sudo -u hdfs hdfs dfs -mv /user/root/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar /spark/jars
sudo -u hdfs hdfs dfs -chown hdfs /spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
sudo -u hdfs hdfs dfs -chmod 777 /spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
  • spark-evn.sh
vim spark-1.6.0-bin-hadoop2-without-hive/conf/spark-env.sh
export JAVA_HOME=/usr/java/default
export SPARK_HOME=/root/spark-1.6.0-bin-hadoop2-without-hive
export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export SPARK_LIBARY_PATH=$SPARK_LIBARY_PATH:$HADOOP_HOME/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/*
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=17777 -Dspark.history.fs.logDirectory=hdfs://xiwu-cluster/spark/log/envent-log"
  • spark-defaults.conf
vim spark-1.6.0-bin-hadoop2-without-hive/conf/spark-defaults.conf
spark.yarn.archive hdfs://xiwu-cluster/spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar
spark.eventLog.enabled  true
spark.eventLog.dir      hdfs://xiwu-cluster/spark/log/envent-log
spark.serializer        org.apache.spark.serializer.KryoSerializer
spark.driver.memory     1g

修改hive-site.xml

<property>
  <name>hive.execution.engine</name>
  <value>spark</value>
</property>
<property>
  <name>hive.enable.spark.execution.engine</name>
  <value>true</value>
</property>
<property>
  <name>spark.home</name>
  <value>/root/spark-1.6.0-bin-hadoop2-without-hive</value>
</property>
<property>
  <name>spark.yarn.jar</name>
  <value>hdfs://xiwu-cluster/spark/jars/spark-assembly-1.6.0-cdh5.14.2-hadoop2.6.0-cdh5.14.2.jar</value>
</property>
<property>
  <name>spark.master</name>
  <value>yarn-cluster</value>
</property>
<property>
  <name>spark.serializer</name>
  <value>org.apache.spark.serializer.KryoSerializer</value>
</property>

官方文档
spark编译 文档
hive on spark 文档

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇使用kettle中遇到的问题 下一篇HIVE学习,安装,命令

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目