Apache Spark ecosystem - Spark

TOP

Apache Spark ecosystem

2019-02-11 01:04:34 【大中小】浏览:55次

Tags：Apache Spark ecosystem

Apache Spark ecosystem

参考资料

1.Apache Spark Ecosystem – Complete Spark Components Guide

2.Apache Spark Ecosystem

3.edureka posts about spark

5.Spark SQL Tutorial – Understanding Spark SQL With Examples

6.Spark Streaming Tutorial – Sentiment Analysis Using Apache Spark

7.Spark MLlib – Machine Learning Library Of Apache Spark

8.Spark GraphX Tutorial – Graph Analytics In Apache Spark

4.Spark Tutorial: Real Time Cluster Computing Framework|
笔记：
Real Time Processing Framework
Real Time Analytics
Why Spark when Hadoop is already there
What is Apache Spark
Spark Features
Getting Started with Spark（Install Spark2.0 on Ubuntu.）
Using Spark with Hadoop
(main concepts of Spark like Spark Session(may be for insteading of Spark Context), Data Sources, RDDs(Resilient Distributed Dataset (RDD)), DataFrames(It is conceptually equivalent to a table in a relational database) and other libraries.)
Spark Components
Use Case: EarthquakeDetection using Spark
This blog is the first blog in the upcoming Apache Spark blog series which will include Spark Streaming, Spark Interview Questions, Spark MLlib and others.
We can seethat Real Time Processing of Big Data is ingrained in every aspect of our lives. From fraud detection in banking to live surveillance systems in government, automated machines in healthcare to live prediction systems in the stock market, everything around us revolves around processing big data in near real time.

Hadoop ===> bach processing, Hadoop is based on batch processing of big data. This means that the data is stored over a period of timeand is then processed using Hadoop.
Spark ===> Real Time Processing Framework, the data is generating over time.

Figure:Spark Tutorial – Differences between Hadoop and Spark

Figure:Spark Tutorial – Spark Features

Hadoop Integration:

Hadoop Integration:Apache Spark provides smooth compatibility with Hadoop. This is aboon for all the Big Data engineers who started their careers with Hadoop. Spark is a potential replacement for the MapReduce functions of Hadoop, while Spark has the ability to run on top of an existing Hadoop clusterusing YARN for resource scheduling.

Figure:Spark Tutorial – Spark Features

Hadoop components can be used alongside Spark in the following ways:
HDFS:Spark can run on top of HDFS to leverage the distributed replicated storage.
MapReduce:Spark can be used along with MapReduce in the same Hadoop cluster or separately as a processing framework.
YARN:Spark applications can be made to run on YARN (Hadoop NextGen).
Batch & Real Time Processing:MapReduce and Spark are used together where MapReduce is used for batch processing and Spark for real-time processing.

Figure:Use Case – Flow diagram of Earthquake Detectionusing Apache Spark


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：Spark介绍及搭建	下一篇：Spark ---Spark 的基本使用