版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/stark_summer/article/details/48379837
Tachyon 0.7.1伪分布式集群安装与测试:
http://blog.csdn.net/stark_summer/article/details/48321605
从官方文档得知,Spark 1.4.x和Tachyon 0.6.4版本兼容,而最新版的Tachyon 0.7.1和Spark 1.5.x兼容,目前所用的Spark为1.4.1,tachyon为 0.7.1
tachyon 与 hdfs整合
修改tachyon-env.sh
export TACHYON_UNDERFS_ADDRESS= hdfs:
Dtachyon. data . folder= $TACHYON_UNDERFS_ADDRESS /tmp/tachyon/data
上传文件到hdfs
hadoop fs -put /home/cluster/data /test/bank/ /data /spark/
hadoop fs -ls /data /spark/bank/
Found 3 items
-rw -r -- r-- 3 wangyue supergroup 4610348 2015 - 09 - 11 20 :02 /data /spark/bank/bank-full . csv
-rw -r -- r-- 3 wangyue supergroup 3864 2015 - 09 - 11 20 :02 /data /spark/bank/bank-names . txt
-rw -r -- r-- 3 wangyue supergroup 461474 2015 - 09 - 11 20 :02 /data /spark/bank/bank. csv
通过tachyon 读取/data/spark/bank/bank-full.csv文件
val bankFullFile = sc.textFile ("tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv" )
2015 -09 -11 20 :08 :20 ,136 INFO [main] storage.MemoryStore (Logging.scala :logInfo(59 )) - ensureFreeSpace(177384 ) called with curMem=630803 , maxMem=257918238
2015 -09 -11 20 :08 :20 ,137 INFO [main] storage.MemoryStore (Logging.scala :logInfo(59 )) - Block broadcast_3 stored as values in memory (estimated size 173.2 KB, free 245.2 MB)
2015 -09 -11 20 :08 :20 ,154 INFO [main] storage.MemoryStore (Logging.scala :logInfo(59 )) - ensureFreeSpace(17665 ) called with curMem=808187 , maxMem=257918238
2015 -09 -11 20 :08 :20 ,155 INFO [main] storage.MemoryStore (Logging.scala :logInfo(59 )) - Block broadcast_3_piece0 stored as bytes in memory (estimated size 17.3 KB, free 245.2 MB)
2015 -09 -11 20 :08 :20 ,156 INFO [sparkDriver-akka.actor .default -dispatcher-2 ] storage.BlockManagerInfo (Logging.scala :logInfo(59 )) - Added broadcast_3_piece0 in memory on localhost:41040 (size: 17.3 KB, free: 245.9 MB)
2015 -09 -11 20 :08 :20 ,157 INFO [main] spark.SparkContext (Logging.scala :logInfo(59 )) - Created broadcast 3 from textFile at <console>:21
bankFullFile: org.apache .spark .rdd .RDD [String] = MapPartitionsRDD[7 ] at textFile at <console>:21
count
bankFullFile .count ()
但是发现报错如下:
2015-09-11 21:34 :31 ,494 WARN [Executor task launch worker-6] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,495 WARN [Executor task launch worker-6] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,489 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,495 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,495 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,495 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,495 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,495 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34 :31 ,496 WARN [Executor task launch worker-7] (RemoteBlockInStream .java :retrieveByteBufferFromRemoteMachine(320)) - Read nothing
感觉错误很诡异,有人知道这是什么原因?tell me why
但是 我在tachyon 文件系统中可以看到如下内容:
. /bin/tachyon tfs ls /data /spark/bank/bank-full . csv/
4502.29 KB09- 11 - 2015 20 :09 :02 :078 Not In Memory /data /spark/bank/bank-full . csv/bank-full . csv
而bank-full.csv在hdfs文件是
hadoop fs -ls /data /spark/bank/
Found 3 items
-rw -r -- r-- 3 wangyue supergroup 4610348 2015 - 09 - 11 20 :02 /data /spark/bank/bank-full . csv
-rw -r -- r-- 3 wangyue supergroup 3864 2015 - 09 - 11 20 :02 /data /spark/bank/bank-names . txt
-rw -r -- r-- 3 wangyue supergroup 461474 2015 - 09 - 11 20 :02 /data /spark/bank/bank. csv
其实Tachyon本身将bank-full.csv文件加载到了内存,并存放到自身的文件系统里面:tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv”
Tachyon的conf/tachyon-env.sh文件里面配置的,通过export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020配置,这样tachyon://localhost:19998就可以获取hdfs文件指定路径文件
好吧,那我就先通过hdfs方式读取文件然后 保存到tachyon
scala> val bankfullfile = sc.textFile ("/data/spark/bank/bank-full.csv" )
scala> bankfullfile.count
res0: Long = 45212
scala> bankfullfile.saveAsTextFile ("tachyon://master:19998/data/spark/bank/newbankfullfile" )
未完成,待续~