设为首页 加入收藏

TOP

六 Spark API介绍
2019-01-06 13:26:51 】 浏览:44
Tags:Spark API 介绍
Spark机器学习,API浏览
Spark官方API
http://spark.apache.org/docs/1.6.2/api/java/index.html
http://spark.apache.org/docs/2.2.0/api/java/index.html

1 RDD的支持,是Spark的基础,2根据需求来查看API

一Spark的功能模块
SparkSQL 
SparkGraphx
SparkScreaming
SparkML
SparkMLLIb

二常用的机器学习的API
ml 输入采用DataFrame(输入来源于SparkSQL)
mllib 输入参数是普通的RDD(输入来自于hdfs)


例子userId(用户ID),productId(产品ID),评分,来推荐给用户

协同过滤来找到用户对其它产品感兴趣
常用算法:ALS算法(最小二乘法)
org.apache.spark.ml.recommendation ALS

监督分类: org.apache.spark.mllib.classification,
预先给用户打上标签

非监督分类mllib.clustering 里面也是一样的方法
KMeans

决策树 mllib.tree

图形计算org.apache.spark.graphx
org.apache.spark.sql : 我们把数据导入到mysql中,如何放入到spark中来,然后进行机器学习进行预测统计分析,然后放入到hdfs中去 

四API扩展
可以从mysql,oracle中读取数据
org.apache.spark.sql
org.apache.spark.sql.api.java
org.apache.spark.sql.expressions
org.apache.spark.sql.hive
org.apache.spark.sql.hive.execution
org.apache.spark.sql.jdbc
org.apache.spark.sql.sources
org.apache.spark.sql.types
org.apache.spark.sql.util

org.apache.spark.straming相当于我们的流式计算,
org.apache.spark.streaming.flume
org.apache.spark.streaming.kafka
org.apache.spark.streaming.kinesis
org.apache.spark.streaming.mqtt
org.apache.spark.streaming.receiver
org.apache.spark.streaming.scheduler
org.apache.spark.streaming.twitter
org.apache.spark.streaming.util
org.apache.spark.streaming.zeromq

ml 输入采用DataFrame(输入来源于SparkSQL)
org.apache.spark.ml
org.apache.spark.ml.attribute
org.apache.spark.ml.classification
org.apache.spark.ml.clustering
org
		    

.apache.spark.ml.eva luation org.apache.spark.ml.feature org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util mllib 输入参数是普通的RDD(输入来自于hdfs) org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.eva luation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.optimization org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distribution org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util

编程开发网
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇O'Reilly精品图书推荐:Spark.. 下一篇第14章 Java拓展

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容:

array(4) { ["type"]=> int(8) ["message"]=> string(24) "Undefined variable: jobs" ["file"]=> string(32) "/mnt/wp/cppentry/do/bencandy.php" ["line"]=> int(217) }