设为首页 加入收藏

TOP

六 Spark API介绍
2019-01-06 13:26:51 】 浏览:29
Tags:Spark API 介绍
Spark机器学习,API浏览
Spark官方API
http://spark.apache.org/docs/1.6.2/api/java/index.html
http://spark.apache.org/docs/2.2.0/api/java/index.html

1 RDD的支持,是Spark的基础,2根据需求来查看API

一Spark的功能模块
SparkSQL 
SparkGraphx
SparkScreaming
SparkML
SparkMLLIb

二常用的机器学习的API
ml 输入采用DataFrame(输入来源于SparkSQL)
mllib 输入参数是普通的RDD(输入来自于hdfs)


例子userId(用户ID),productId(产品ID),评分,来推荐给用户

协同过滤来找到用户对其它产品感兴趣
常用算法:ALS算法(最小二乘法)
org.apache.spark.ml.recommendation ALS

监督分类: org.apache.spark.mllib.classification,
预先给用户打上标签

非监督分类mllib.clustering 里面也是一样的方法
KMeans

决策树 mllib.tree

图形计算org.apache.spark.graphx
org.apache.spark.sql : 我们把数据导入到mysql中,如何放入到spark中来,然后进行机器学习进行预测统计分析,然后放入到hdfs中去 

四API扩展
可以从mysql,oracle中读取数据
org.apache.spark.sql
org.apache.spark.sql.api.java
org.apache.spark.sql.expressions
org.apache.spark.sql.hive
org.apache.spark.sql.hive.execution
org.apache.spark.sql.jdbc
org.apache.spark.sql.sources
org.apache.spark.sql.types
org.apache.spark.sql.util

org.apache.spark.straming相当于我们的流式计算,
org.apache.spark.streaming.flume
org.apache.spark.streaming.kafka
org.apache.spark.streaming.kinesis
org.apache.spark.streaming.mqtt
org.apache.spark.streaming.receiver
org.apache.spark.streaming.scheduler
org.apache.spark.streaming.twitter
org.apache.spark.streaming.util
org.apache.spark.streaming.zeromq

ml 输入采用DataFrame(输入来源于SparkSQL)
org.apache.spark.ml
org.apache.spark.ml.attribute
org.apache.spark.ml.classification
org.apache.spark.ml.clustering
org.apache.spark.ml.eva luation
org.apache.spark.ml.feature
org.apache.spark.ml.param
org.apache.spark.ml.recommendation
org.apache.spark.ml.regression
org.apache.spark.ml.source.libsvm
org.apache.spark.ml.tree
org.apache.spark.ml.tuning
org.apache.spark.ml.util

mllib 输入参数是普通的RDD(输入来自于hdfs)
org.apache.spark.mllib.classification
org.apache.spark.mllib.clustering
org.apache.spark.mllib.eva luation
org.apache.spark.mllib.feature
org.apache.spark.mllib.fpm
org.apache.spark.mllib.linalg
org.apache.spark.mllib.linalg.distributed
org.apache.spark.mllib.optimization
org.apache.spark.mllib.pmml
org.apache.spark.mllib.random
org.apache.spark.mllib.rdd
org.apache.spark.mllib.recommendation
org.apache.spark.mllib.regression
org.apache.spark.mllib.stat
org.apache.spark.mllib.stat.distribution
org.apache.spark.mllib.stat.test
org.apache.spark.mllib.tree
org.apache.spark.mllib.tree.configuration
org.apache.spark.mllib.tree.impurity
org.apache.spark.mllib.tree.loss
org.apache.spark.mllib.tree.model
org.apache.spark.mllib.util


编程开发网
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇O'Reilly精品图书推荐:Spark.. 下一篇第14章 Java拓展

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容:

array(4) { ["type"]=> int(8) ["message"]=> string(24) "Undefined variable: jobs" ["file"]=> string(32) "/mnt/wp/cppentry/do/bencandy.php" ["line"]=> int(214) }