设为首页 加入收藏

TOP

spark mllib 归一化
2019-01-06 01:29:10 】 浏览:125
Tags:spark mllib
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/mlljava1111/article/details/52931231

这里写图片描述

 val testdata=sql("select * from test").map{ line =>
    line.toSeq.map {_.toString}.toArray
  }.map{  line =>(line(0),line(1),Vectors.dense(line.drop(1).drop(1).map(_.toDouble)))}.toDF("id","name","features")

StandardScaler

  val scaler1 = new StandardScaler().setInputCol("features").setOutputCol("scaledFeatures").setWithMean(true).setWithStd(true).fit(testdata)
  val scaledData = scaler1.transform(testdata)

  val featuresdatatran = scaledData.map{row=>(row.getAs[String]("id"),row.getAs[Vector]("scaledFeatures"))}

  featuresdatatran.collect()
#(1001,[-1.0,-1.0]), 
#(1002,[0.0,0.0]), 
#(1003,[1.0,1.0])

MinMaxScaler

val scaler = new MinMaxScaler().setInputCol("features").setOutputCol("scaledFeatures")

  val scalerModel = scaler.fit(testdata)
  // rescale each feature to range [min, max].
  val scaledData = scalerModel.transform(testdata)
  //  scaledData.printSchema()
  // val featuresdata = scaledData.select($"scaledFeatures")
  val featuresdatatran = scaledData.map{row=>(row.getAs[String]("id"),row.getAs[Vector]("scaledFeatures"))}

featuresdatatran.collect()
#(1001,[0.0,0.0]), 
#(1002,[0.5,0.5]), 
#(1003,[1.0,1.0])

Normalizer

  val testdata=sql("select * from test").map{ line =>
    line.toSeq.map {_.toString}.toArray
  }.map{  line =>(line(0),line(1),Vectors.dense(line.drop(1).drop(1).map(_.toDouble)))}
val featuresdatatran= new Normalizer().transform(testdata.map(_._3))

featuresdatatran.collect()
#(1001,[0.9805806756909202,0.19611613513818404]), 
#(1002,[0.9889363528682975,0.14834045293024462]), 
#(1003,[0.9912279006826347,0.13216372009101796])
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Spark中repartition和partitionBy.. 下一篇Spark资源调度分配原理

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目