samza是一个分布式的流式数据处理框架(streaming processing),它是基于Kafka消息队列来实现类实时的流式数据处理的。(准确的说,samza是通过模块化的形式来使用kafka的,因此可以构架在其他消息队列框架上,但出发点和默认实现是基于kafka)
Apache Kafka主要是用来控制发消息的
Apache Hadoop YARN会提供错误信息,隔离处理器,安全和资源管理.
本文将介绍怎么在 Ubuntu 14.04 的32位 系统上安装Samza.
安装准备:
要安装和配置Apache-Samza,需要以下东西
JDK 1.7
maven2
kafka
yarn
zookeeper
# apt-get install curl gem
下载并设置JDK路径:
我们需要安装JDK并设置好其环境变量.
# cd /usr/java # wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-i586.tar.gz" # tar xzf jdk-7u79-linux-i586.tar.gz
解压并设置好JAVA_HOME路径
# tar -zxvf jdk-7u79-linux-i586.tar.gz # JAVA_HOME=/usr/java/jdk1.7.0_79 # export JAVA_HOME # PATH=$JAVA_HOME/bin:$PATH # export PATH
把上面的加入到 ~/.bashrc 和 /etc/bashrc文件去
安装Maven2:
接下来下载安装maven
# wget https://launchpad.net/~bneijt/+archive/ubuntu/ppa/+build/2139203/+files/maven3_3.0.1-0~ppa2_all.deb
# dpkg -i maven3_3.0.1-0~ppa2_all.deb
检查maven版本好
# mvn3 -version
Apache Maven 3.0.1 (r1038046; 2010-11-23 16:28:32+0530)
Java version: 1.7.0_79
Java home: /usr/java/jdk1.7.0_79/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux" version: "3.8.0-29-generic" arch: "i386" Family: "unix"
Java version: 1.7.0_79
Java home: /usr/java/jdk1.7.0_79/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux" version: "3.8.0-29-generic" arch: "i386" Family: "unix"
安装Hello- Samza :
我们就按照在 / usr /local 文件夹下面把
# cd /usr/local
把hello-samza复制进来,
# git clone git://git.apache.org/samza-hello-samza.git hello-samza
本项目中含有一个"grid"的脚本,其中有hello-samza变量,有了这个你可以搞定一切了. 使用它可以安装 Kafka, Yarn和Zookeeper.
执行下面的命令,
# cd /usr/local/hello-samza
root@dev:/usr/local/hello-samza# bin/grid install kafka
EXECUTING: install kafka
Downloading kafka_2.10-0.8.2.1.tgz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
15 15.4M 15 2406k 0 0 304k 0 0:00:51 0:00:07 0:00:44 443k
Downloading kafka_2.10-0.8.2.1.tgz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
15 15.4M 15 2406k 0 0 304k 0 0:00:51 0:00:07 0:00:44 443k
root@dev:/usr/local/hello-samza# bin/grid install yarn
EXECUTING: install yarn
Downloading hadoop-2.6.1.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
77 187M 77 145M 0 0 239k 0 0:13:23 0:10:22 0:03:01 204k
Downloading hadoop-2.6.1.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
77 187M 77 145M 0 0 239k 0 0:13:23 0:10:22 0:03:01 204k
root@dev:/usr/local/hello-samza# bin/grid install zookeeper
EXECUTING: install zookeeper
Downloading zookeeper-3.4.3.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
&
Downloading zookeeper-3.4.3.tar.gz...
% Total % Received % Xferd Average Speed Time Time Time Current
&