spark-shell简单使用介绍(scala) - Scala

TOP

spark-shell简单使用介绍(scala)(一)

2017-10-09 13:25:49 【大中小】浏览:2813次

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

1.进入命令窗口

./bin/spark-shell

附上帮助指令，查看一些帮助信息

scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:edit <id>|<line>        edit history
:help [command]          print this summary or command-specific help
:history [num]           show the history (optional num is commands to show)
:h? <string>             search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v]          show the implicits in scope
:javap <path|class>      disassemble a file or class name
:line <id>|<line>        place line(s) at the end of history
:load <path>             interpret lines in a file
:paste [-raw] [path]     enter paste mode or paste a file
:power                   enable power user mode
:quit                    exit the interpreter
:replay [options]        reset the repl and replay all previous commands
:require <path>          add a jar to the classpath
:reset [options]         reset the repl to its initial state, forgetting all session entries
:save <path>             save replayable session to a file
:sh <command line>       run a shell command (result is implicitly => List[String])
:settings <options>      update compiler options, if possible; see reset
:silent                  disable/enable automatic printing of results
:type [-v] <expr>        display the type of an expression without eva luating it
:kind [-v] <expr>        display the kind of expression's type
:warnings                show the suppressed warnings from the most recent line which had any

2.使用spark加载文件，创建Dataset

scala> val textFile = spark.read.textFile("hdfs://cluster1/input/README.txt")
textFile: org.apache.spark.sql.Dataset[String] = [value: string]

3.使用sc加载文件，创建RDD

scala> val textFile=sc.textFile("hdfs://cluster1/input/README.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs://cluster1/input/README.txt MapPartitionsRDD[1] at textFile at <console>:24

4.统计textFile里面有多少行(item)

提君博客原创

scala> textFile.count()    // Number of items in this Dataset
res0: Long = 31

5.查看第一个iterm

scala> textFile.first()   // First item in this Dataset
res1: String = For the latest information about Hadoop, please visit our website at:

上面都挺简单，下面来一个完整的wordcount

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

6.wordcount

scala> val wordsRdd=textFile.flatMap(line=>line.split(" "))
wordsRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at <console>:26

scala> val kvsRdd=wordsRdd.map(word=>(word,1))
kvsRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:28

scala> val countRdd=kvsRdd.reduceByKey(_+_)
countRdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:30

scala> countRdd.collect()
res2: Array[(String, Int)] = Array((under,1), (this,3), (distribution,2), (Technology,1), (country,1), (is,1), (Jetty,1), (currently,1), (permitted.,1), (check,1), (have,1), (Security,1), (U.S.,1), (with,1), (BIS,1), (This,1), (mortbay.org.,1), ((ECCN),1), (using,2), (security,1), (Department,1), (export,1), (reside,1), (any,1), (algorithms.,1), (from,1), (re-export,2), (has,1), (SSL,1), (Industry,1), (Administration,1), (details,1), (provides,1), (http:

首页上一页 1 2 下一页尾页 1/2/2
【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：Akka（27）： Stream：Use case-C..	下一篇：scala面向对象.高阶函数,柯里化,A..