设为首页 加入收藏

TOP

Spark+HDFS实现wordCount
2019-04-22 00:13:29 】 浏览:66
Tags:Spark HDFS 实现 wordCount

首先启动环境:


1、启动hdfs

[root@master conf]# start-dfs.sh

2、然后启动spark

[root@master spark-2.2.0]# sbin/start-all.sh --master spark://master.hadoop:7077

[root@master spark-2.2.0]# bin/spark-shell --master spark://master.hadoop:7077

[root@master spark-2.2.0]# bin/spark-shell --master spark://slave3.hadoop:7077
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/08/30 13:59:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable     
18/08/30 13:59:39 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.1.5:4040
Spark context available as 'sc' (master = spark://slave3.hadoop:7077, app id = app-20180830135927-0000).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them eva luated.
Type :help for more information.

scala>

执行此脚本集群会发生的变化,master上会启动这样的shell,而work上会启动CoarseGrainedExecutorBackend进程

master机器的进程:

[root@master spark-2.2.0]# jps
8643 DataNode
8548 NameNode
9205 Jps
8874 Master

worker机器的进程:

[root@slave1 conf]# jps
3472 DataNode
3540 SecondaryNameNode
3798 Worker
3862 CoarseGrainedExecutorBackend
3958 Jps

编写wordcount程序:

首先将要计算的数据上传至hdfs,我上传至了/spark目录下。

sc.textFile("hdfs://master.hadoop:9000/spark").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).sortBy(_._2).collect

并将结果收集到客户端,并以数组的形式展示,执行结果如下:

scala> sc.textFile("hdfs://slave3.hadoop:9000/a.txt").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).sortBy(_._2).collect
res1: Array[(String, Int)] = Array((jim,1), (jarry,1), (wo,1), (ni,1), (hello,4))

scala>

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇HDFS读写过程(含存储副本) 下一篇Sqoop1.99 从SQL Server导数据到H..

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目