版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xfks55/article/details/80783375
文章连接
1.spark学习-hadoop安装与启动
2.spark学习-spark安装和启动
安装前准备
1.首先准备三台服务器.一台master,两台slave.
172.18.101.157 spark-master
172.18.101.162 spark-slave1
172.18.132.162 spark-slave2
2.设置免密登录
1. 生成私钥和公钥
[root@spark -master data]
一直敲回车,最后生产密钥
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:xmlOlzYrf0obAn5T5rN+nI39yf+lXTibaETnu72yEN8 root@spark-master
The key's randomart image is:
+---[RSA 2048 ]----+
| |
| |
| |
| . . .. . |
| . S Bo o |
| . * * o+ o. |
| . * *+ *oEo|
| . * =Oo=B+|
| .*=..B*X|
+----[SHA256]-----+
[root@spark-master data]#
2. 将公钥复制到slave机器上
[root@spark-master data ]# scp /root/.ssh/id_rsa.pub root@172.18.101.162:/data /
[root@spark-master data ]# scp /root/.ssh/id_rsa.pub root@172.18.132.162:/data /
进入slave机器,如公钥导入授权文件中
[root@spark -slave1 ~]
[root@spark -slave1 data]
[root@spark -master ~]
Last login: Fri Jun 22 20 : 07 : 35 2018 from 172.18 .101.157
Welcome to Alibaba Cloud Elastic Compute Service !
[root@spark -master ~]
Last login: Fri Jun 22 20 : 07 : 31 2018 from 172.18 .101.157
Welcome to Alibaba Cloud Elastic Compute Service !
3.安装jdk
vim /etc/profile
配置环境变量
JAVA_HOME =/opt/jdk 8
CLASSPATH =.: $JAVA_HOME /lib.tools.jar
PATH =$JAVA_HOME /bin: $PATH
使环境变量生效
source /etc/profile
安装hadoop
1.配置网络连接
将所有节点的hosts改成
vim /etc/hosts
127.0.0.1 localhost
172.18.101.157 spark-master
172.18.101.162 spark-slave1
172.18.132.162 spark-slave2
2.设置hadoop环保变量
export HADOOP_HOME=/usr/local/hadoop-2.7 .6
export PATH=$PATH :$HADOOP_HOME /bin
3.修改配置文件
一共需要修改五个配置文件,位于安装路径下etc/hadoop目录下.
分别是core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml和slaves文件
core-site.xml
<configuration >
<property >
<name > fs.defaultFS</name >
<value > hdfs://spark-master:9000</value >
</property >
<property >
<name > hadoop.tmp.dir</name >
<value > /home/hadoop/tmp</value >
</property >
hdfs-site.xml
<configuration >
<property >
<name > dfs.name.dir</name >
<value > file:/home/hadoop/hdfs/name</value >
<description > namenode上存储hdfs名字空间元数据 </description >
</property >
<property >
<name > dfs.data.dir</name >
<value > file:/home/hadoop/hdfs/data</value >
<description > datanode上数据块的物理存储位置</description >
</property >
<property >
<name > dfs.replication</name >
<value > 1</value >
</property >
</configuration >
mapred-site.xml
<configuration >
<property >
<name > mapreduce.framework.name</name >
<value > yarn</value >
</property >
<property >
<name > mapreduce.jobhistory.address</name >
<value > spark-master:10020</value >
</property >
<property >
<name > mapreduce.jobhistory.webapp.address</name >
<value > spark-master:19888</value >
</property >
</configuration >
yarn-site.xml
<configuration >
<property >
<name > yarn.log-aggregation-enable</name >
<value > true</value >
</property >
<property >
<name > yarn.log.server.url</name >
<value > http://spark-master:19888/jobhistory/logs/</value >
</property >
<property >
<name > yarn.log-aggregation.retain-seconds</name >
<value > 86400</value >
</property >
<property >
<name > yarn.nodemanager.aux-services</name >
<value > mapreduce_shuffle</value >
</property >
<property >
<name > yarn.resourcemanager.webapp.address</name >
<value > spark-master:8099</value >
</property >
<property >
<name > yarn.resourcemanager.hostname</name >
<value > spark-master</value >
</property >
</configuration >
slaves文件 配置从节点
spark-slave1
spark-slave2
4.将hadoop的安装文件,copy到从节点
tar -cvf hadoop.tar hadoop-2.7 .6 /
[root@spark -master local]
[root@spark -master local]
进入两台slave,然后解压到当前目录下面
[root@spark -slave1 local ]
5.初始化namenode
hdfs namenode -format
6.启动服务
[root@spark-master hadoop-2.7.6]# sbin/start -all .sh
spark-master: starting namenode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-namenode-spark-master.out
spark-slave2: starting datanode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-datanode-spark-slave2.out
spark-slave1: starting datanode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-datanode-spark-slave1.out
spark-slave2: /usr/local /hadoop-2.7 .6 /bin/hdfs: line 304 : /opt/jdk1.8 .0 _144/bin/java : No such file or directory
spark-slave1: /usr/local /hadoop-2.7 .6 /bin/hdfs: line 304 : /opt/jdk1.8 .0 _144/bin/java : No such file or directory
Starting secondary namenodes [0.0 .0 .0 ]
0.0 .0 .0 : starting secondarynamenode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-secondarynamenode-spark-master.out
starting yarn daemons
上面日志说slave机器jdk路径不存在.这个是由于slave机器的jdk路径配置跟master机器上的路径不一样,我们去从机器上修改.改配置文件位于vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0 _144
再次启动
[root@spark-master hadoop-2.7.6]# sbin/start -all .sh
This script is Deprecated. Instead use start -dfs.sh and start -yarn.sh
Starting namenodes on [spark-master]
spark-master: starting namenode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-namenode-spark-master.out
spark-slave1: starting datanode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-datanode-spark-slave1.out
spark-slave2: starting datanode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-datanode-spark-slave2.out
Starting secondary namenodes [0.0 .0 .0 ]
0.0 .0 .0 : starting secondarynamenode, logging to /usr/local /hadoop-2.7 .6 /logs/hadoop-root-secondarynamenode-spark-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local /hadoop-2.7 .6 /logs/yarn-root-resourcemanager-spark-master.out
spark-slave1: starting nodemanager, logging to /usr/local /hadoop-2.7 .6 /logs/yarn-root-nodemanager-spark-slave1.out
spark-slave2: starting nodemanager, logging to /usr/local /hadoop-2.7 .6 /logs/yarn-root-nodemanager-spark-slave2.out
查看运行节点
启动完成之后可以通过jps命令查看运行的进程.
master机器上应该运行的进程为:
[root@spark -master hadoop-2.7 .6 ]
7360 NameNode
7778 ResourceManager
7592 SecondaryNameNode
slave机器上应该运行的进程为:
[root@spark -slave1 ~]
17061 NodeManager
16951 DataNode
这样子hadoop就算是安装成功了.
hdfs命令
至此hadoop安装完成,我们可以用hdfs命令查看当前文件夹
[root@spark-master ~]# hadoop fs -mkdir /test
[root@spark-master ~]# hadoop fs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2018-06-23 15:11 /test
drwxr-xr-x - root supergroup 0 2018-06-23 15:08 /usr
常用命令:
hadoop fs -ls path //列出目录下文件
hadoop fs -mkdir path //创建目录
hadoop fs -rm path //删除文件或者目录
hadoop fs -rmdir path //删除目录
hadoop fs -put localfile path //将文件从本地上传到HDFS的指定文件夹中
hadoop fs -cat filename //查看指定的文件内容
hadoop fs -get HDFSfile localfilename //将HDFS文件下载到本地,命名为localfilename