设为首页 加入收藏

TOP

Spark Cluster Computing Framework
2019-01-27 01:25:00 】 浏览:93
Tags:Spark Cluster Computing Framework

Spark Cluster Computing Framework

What is Spark

Spark is an open source cluster computing system that aims to make data analytics
fast — both fast to run and fast to write.

To run programs faster, Spark provides primitives for in-memory cluster
computing: your job can load data into memory and query it repeatedly much quicker than with
disk-based systems like Hadoop MapReduce.

To make programming faster, Spark integrates into the
Scala language, letting you manipulate distributed
datasets like local collections. You can also use Spark interactively to query
big data from the Scala interpreter.

What can it do

Spark was initially developed for two applications where keeping data in memory
helps: iterative
algorithms, which are common in machine learning, and interactive
data mining. In both cases, Spark can outperform Hadoop MapReduce by 30x.
However, you can use Spark for general data processing too.
Check out our example jobs.

Spark runs on the Apache Mesos cluster manager, letting
it coexist with Hadoop. It can also read any data source supported by Hadoop.

Who uses it

Spark was developed in the UC Berkeley AMP Lab.
It's used by several groups of researchers at Berkeley to run large-scale applications such
as spam filtering, natural language processing and road traffic prediction. It's also used to accelerate
data analytics at Conviva,
Klout, and Quantifind,
and other companies.
Spark is open source under a BSD license,
so download it to check it out!

】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
上一篇Structured Streaming与Flink比较 下一篇Spark源码分析之Spark-submit和Sp..

最新文章

热门文章

Hot 文章

Python

C 语言

C++基础

大数据基础

linux编程基础

C/C++面试题目