In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a potentially large number of distributed workers called executors.The driver runs in its own Java process and each executor is a separate Java process. A driver and its executors are together termed a Spark application.
The Driver
- Converting a user program into tasks
- Scheduling tasks on executors
Executors
- run the tasks that make up the application and return results to the driver
- provide in-memory storage for RDDs that are cached by user programs
Cluster Manager
Spark depends on a cluster manager to launch executors and,in certain cases, to launch the driver.The cluster manager is a pluggable component in Spark.This allows Spark to run on top of different external managers,such as YARN and Mesos,as well as its built-in Standalone cluster manager.
Spark can run both drivers and executors on the YARN worker nodes.
The procdure of run a spark application
- The user submits an application using spark-submit.
- spark-submit launches the driver program and invokes the main() method specified by the user.
- The driver program contacts the cluster manager to ask for resources to launch executors.
- The cluster manager launches executors on behalf of the driver program.
- The driver process runs through the user application.Based on the RDD actions and transformations in the program,the driver sends work to executors in the form of tasks.
- Tasks are run on executor processes to compute and save results.
- If the driver’s main() method exits or it calls SparkContext.stop(),it will terminate the executors and release resources from the cluster manager.
Preferences
<<Learning Spark>>
相关推荐
spark学习 Spark: Cluster Computing withWorking Sets Matei
深入理解Spark:核心思想及源码分析.pdf 深入理解Spark:核心思想及源码分析.pdf
mongodb-spark官方连接器,运行spark-submit --packages org.mongodb.spark:mongo-spark-connector_2.11:1.1.0可以自动下载,国内网络不容易下载成功,解压后保存到~/.ivy2目录下即可。
Spark: The Definitive Guide: Big Data Processing Made Simple 1st Edition Spark: The Definitive Guide: Big Data Processing Made Simple 1st Edition Spark: The Definitive Guide: Big Data Processing Made ...
Spark: The Definitive Guide: Big Data Processing Made Simple 英文高清pdf版,绝对好资源。
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark by Holden Karau English | 25 May 2017 | ASIN: B0725YT69J | 358 Pages | AZW3 | 3.09 MB Apache Spark is amazing when ...
深入理解spark: 核心思想与源码分析pdf 大数据技术丛书
Spark: svn co http://svn.igniterealtime.org/svn/repos/spark/trunk spark 辛辛苦苦从SVN上下载下来的SOURCE Spark 2.6.3 Spark: spark/trunk part001 第一部分
Spark: svn co http://svn.igniterealtime.org/svn/repos/spark/trunk spark 辛辛苦苦从SVN上下载下来的SOURCE Spark 2.6.3 Spark: spark/trunk part003 第三部分
Spark: Big Data Cluster Computing in Production English | 2016 | ISBN: 1119254019 | 216 pages | PDF | 5 MB Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster ...
深入理解Sp深入理解SPARK:核心思想与源码分析》结合大量图和示例,对Spark的架构、部署模式和工作模块的设计理念、实现源码与使用技巧进行了深入的剖析与解读。 《深入理解SPARK:核心思想与源码分析》一书对Spark...
Spark: svn co http://svn.igniterealtime.org/svn/repos/spark/trunk spark 辛辛苦苦从SVN上下载下来的SOURCE Spark 2.6.3 Spark: spark/trunk part002 第二部分
酷玩 Spark: Spark 源代码解析、Spark 类库等。、。。。
Spark:零基础实战
Apache Spark:源码剖析
Spark:大数据实例开发教程
大数据Spark:企业级实战
Spark:最佳实践
Spark:内核机制解析及性能调优
Spark: Cluster Computing with Working Sets matei的论文