How a plan is split into Job and stage in Spark? -


i'm trying understand how query plan splitted job in spark, how spark decide split workflow in job , stage.

my query simple join between big table (22gb) , small table (300mb):

select key0 bigtable,smalltable importedkey=keysmalltable 

and execute shuffledhashjoin , broadcasthashjoin.

for shuffledhashjoin 1 job:

enter image description here

and 4 stage:

enter image description here

for broadcasthashjoin 3 job:

enter image description here

and 4 stage:

enter image description here

enter image description here enter image description here

this job's division caused "collect" on driver required when use broadcasthashjoin?


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -