How a plan is split into Job and stage in Spark? -


i'm trying understand how query plan splitted job in spark, how spark decide split workflow in job , stage.

my query simple join between big table (22gb) , small table (300mb):

select key0 bigtable,smalltable importedkey=keysmalltable 

and execute shuffledhashjoin , broadcasthashjoin.

for shuffledhashjoin 1 job:

enter image description here

and 4 stage:

enter image description here

for broadcasthashjoin 3 job:

enter image description here

and 4 stage:

enter image description here

enter image description here enter image description here

this job's division caused "collect" on driver required when use broadcasthashjoin?


Comments

Popular posts from this blog

Capture and play voice with Asterisk ARI -

java - Why database contraints in HSQLDB are only checked during a commit when using transactions in Hibernate? -

visual studio - Installing Packages through Nuget - "Central Directory corrupt" -