How a plan is split into Job and stage in Spark? -


i'm trying understand how query plan splitted job in spark, how spark decide split workflow in job , stage.

my query simple join between big table (22gb) , small table (300mb):

select key0 bigtable,smalltable importedkey=keysmalltable 

and execute shuffledhashjoin , broadcasthashjoin.

for shuffledhashjoin 1 job:

enter image description here

and 4 stage:

enter image description here

for broadcasthashjoin 3 job:

enter image description here

and 4 stage:

enter image description here

enter image description here enter image description here

this job's division caused "collect" on driver required when use broadcasthashjoin?


Comments

Popular posts from this blog

c# - Can I intercept a SOAP response in .NET before a content type binding mismatch ProtocolException? -

python - Terminate a gnome-terminal opened with subprocess -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -