大数据学习——Flume介绍与安装 - 数据库编程

TOP

大数据学习——Flume介绍与安装(一)

2017-09-19 13:12:35 【大中小】浏览:10112次

Tags：数据学习 Flume 介绍安装

Flume

实验环境：

shiyanlou

- CentOS6.6 64

- JDK 1.7.0_55 64

- Hadoop 1.1.2

Flume 介绍

Flume是Cloudera提供的日志收集系统。Flume支持在日志系统中定制各类数据发送方，用于收集数据;同时，Flume提供对数据进行简单处理，并写到各种数据接受方(可定制)的能力。

Flume是一个分布式、可靠、高可用的海量日志采集、聚合和传输的系统。

Flume特点

Reliability：数据可靠性，包括End-to-end，Store on failure和Best effort Scalability：Flume的3大组件collector、master和storage tier都是可伸缩的 Manageability：利用ZooKeeper和gossip，保证配置数据的一致性、高可用，同时多Master Extensibility：基于Java，用户可以为Flume添加各种新的功能。

Flume架构

其中最重要的抽象是data flow，描述了数据从产生、传输、处理并最终写入目标的一条路径。

上图实线是data flow。

Agent用于采集数据，agent是flume中产生数据流的地方，同时，agent会将产生的数据流传输到collector。对应的，collector用于对数据进行聚合，往往产生一个更大的流。

Flume提供了从console(控制台)、RPC(Thrift-RPC)、text(文件)、tail(UNIX tail)、syslog(syslog日志系统，支持TCP和UDP等2种模式)，exec(命令执行)等数据源上收集数据的能力。

同时，Flume的数据接受方，可以是console(控制台)、text(文件)、dfs(HDFS文件)、RPC(Thrift-RPC)和syslogTCP(TCP syslog日志系统)等。

其中，收集数据有2种主要工作模式：

- Push Sources：外部系统会主动地将数据推送到Flume中，如RPC、syslog

- Polling Sources：Flume到外部系统中获取数据，一般使用轮询的方式，如text和exec

注意，在Flume中，agent和collector对应，而source和sink对应。

Source和sink强调发送、接受方的特性(如数据格式、编码等)，而agent和collector关注功能。

Flume Master用于管理数据流的配置。Flume Master间使用gossip协议同步数据。

安装部署Flume

下载地址

https://flume.apache.org/download.html

cd /home/shiyanlou/install-pack

tar -xzf flume-1.5.2-bin.tar.gz

mv apache-flume-1.5.2-bin /app/flume-1.5.2

sudo vi /etc/profile

export FLUME_HOME=/app/flume-1.5.2

export FLUME_CONF_DIR=$FLUME_HOME/conf

export PATH=$PATH:$FLUME_HOME/bin

source /etc/profile

echo $PATH

cd /app/flume-1.5.2/conf

cp flume-env.sh.template flume-env.sh

sudo vi flume-env.sh

JAVA_HOME= /app/lib/jdk1.7.0_55

JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

cp flume-conf.properties.template flume-conf.properties

sudo vi flume-conf.properties

# The configuration file needs to define the sources, the channels and the sinks.

# Sources, channels and sinks are defined per agent, in this case called 'a1'

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# For each one of the sources, the type is defined

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

#The channel can be defined as follows.

a1.sources.r1.channels = c1

# Each sink's type must be defined

a1.sinks.k1.type = logger

#Specify the channel the sink should use

a1.sinks.k1.channel = c1

# Each channel's type is defined.

a1.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

cd /app/flume-1.5.2

./bin/flume-ng agent --conf ./conf/ --conf-file ./conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

下面的测试在shiyanlou无法进行：

另开一个终端

#sudo yum install telnet

telnet localhost 44444

hello world

在原来的终端上，可以收到来自于telnet发出的消息。

cd /app/flume-1.5.2/conf

cp flume-conf.properties.template flume-conf2.properties

sudo vi flume-conf2.properties

a1.sources = r1

a1.sinks = k1

a1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.channels = c1

a1.sources.r1.command = tail -F /app/hadoop-1

首页上一页 1 2 下一页尾页 1/2/2
【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：BI大数据的星形模型和雪花模型	下一篇：Mysql千万级大数据SQL查询优化