Big Data & Spark Introduction

Spark Ecosystem
  • User submits application to spark, the driver program then takes control of the application.
  • Spark context breaks the application into logical plan, which then is converted into physical plan by DAG(directed acyclic graph) scheduler. At this point, we would have stages and tasks identified. (Details explained in above step)
  • DAG Scheduler solely works on driver program, this is created after spark context creation after Task Scheduler and SchedulerBackend creation. DAG Scheduler handles computation of DAG,Preferred locations to run, Shuffle outputs being lost. DAG Scheduler typically sends requests to Task scheduler to create the tasks. DAG Scheduler holds information about the RDD and the details of the host RDD (resilient distributed datasets) resides to as not to re-run same compute again.
  • Task Scheduler takes the submitted tasks from DAG scheduler and executes the tasks through worker node with help of cluster manager. DAG scheduler manages the state of the tasks and RDDs.

--

--

--

About 12 years professional experience. Working in multiple domains ML/DL research, Software Engineering, Data Engineering and Data Science.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Essentials of a modern-day Enterprise Application Development Platform

Exploring the Spotify API in Python

Allow administrators to delete any chat message in Teams

Build with Augmented Reality (AR) on Android Using ARCore and Sceneform

Distributed teams: the art of thinking independently together

Deep Dive into Docker Security

Get over SIEM event normalization

How I learned to love constraint-based systems engineering

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
charantej thota

charantej thota

About 12 years professional experience. Working in multiple domains ML/DL research, Software Engineering, Data Engineering and Data Science.

More from Medium

Getting Started with Apache Spark on Databricks

SPARK RDDs

Spark Logical And Physical Plans

Real world Big data Project Using Apache Spark and Scala