Hadoop伪分布式搭建
1、操作系统环境准备
Java环境 免秘钥
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Ssh node01—生成.ssh文件
ssh-keygen---生成公钥秘钥
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
ssh-copy-id nodeX
ssh-copy-id node01—将公钥追加到认证文件
2、上传 压缩解压包
tar zxvf hadoop-2.6.5.tar.gz -C /opt/sxt/
/opt/sxt/hadoop-2.6.5/etc/hadoop 目录下修改配置文件
修改hadoop-env.sh、mapred-env.s、yarn-env.sh中的java_home变量 因为远程登录的时候不会执行/etc/profile文件
伪分布式搭建
3、修改配置文件
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/sxt/hadoop/local</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
slaves
node01
4、hdfs namenode -format 格式化
5、启动 start-dfs.sh
6、操作文件命令
hdfs dfs -mkdir /user
hdfs dfs -ls /user
hdfs dfs -mkdir /user/root
hdfs dfs -D dfs.blocksize=1048576 -put hadoop-2.6.5.tar.gz
完全分布式搭建
1,每台服务器要:
安装jdk
配置环境变量
免密钥登陆
控制节点scp自己的id_dsa.pub分发到其他节点
cat ~/node1.pub >> ~/.ssh/authorized_keys
mkdir /opt/sxt
/etc/hosts
2,取一个节点:
配置Hadoop的配置文件:见第二页
3,分发部署包到其他节点
cd /opt/sxt
scp -r hadoop-2.6.5 node02:`pwd`
scp -r hadoop-2.6.5 node03:`pwd`
scp -r hadoop-2.6.5 node04:`pwd
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:9000</value>
</property>
配置这个目录namenode datanode snn存放数据的目录。不需要自己创建。HDFS
<property>
<name>hadoop.tmp.dir</name>
<value>/var/sxt/hadoop/full</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node02:50090</value>
</property>
slaves
node02
node03
node04
hdfs namenode -format (node01)
执行启动命令时 会在datanode节点上创建持久化目录。路径为core-site.xml中配置的文件路径
6,start-dfs.sh
7,每个节点jps验证,node01:50070
HA搭建
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node01:8485;node02:8485;node03:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/zf/hadoop/ha/jn</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
******core-site.xml
注意:hadoop.tmp.dir的配置要变更:/var/sxt/hadoop-2.6/ha
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node02:2181,node03:2181,node04:2181</value>
</property>
Zookeeper 配置
1,zookeeper(配置)
conf/zoo.cfg
dataDir=/var/test/zk
server.1=node02:2888:3888
server.2=node03:2888:3888
server.3=node04:2888:3888
/var/sxt/zk
echo 1 > myid //数字根据节点规划
配置文件:集群中要同步!!!
zookeepr配置
启动zookeeper集群
启动:zkServer.sh start
查看状态:zkServer.sh status
停止:zkServer.sh stop
1、HA集群启动首先需要启动journalnode
hadoop-daemon.sh start journalnode 在三台秀
2、需要找一台namenode格式化
hdfs namenode -format
hadoop-deamon.sh start namenode
3、另一台NN:同步数据 必须在启动namenode之后进行同步数据
hdfs namenode -bootstrapStandby
4、格式化zookeeper集群后在启动start-dfs.sh
$ZOOKEEPER/bin/zkCli.sh
查看根目录ls /
在NM上格式化 hdfs zkfc -formatZK
stop-dfs.sh && start-dfs.sh