How a plan is split into Job and stage in Spark? -
i'm trying understand how query plan splitted job in spark, how spark decide split workflow in job , stage.
my query simple join between big table (22gb) , small table (300mb):
select key0 bigtable,smalltable importedkey=keysmalltable
and execute shuffledhashjoin , broadcasthashjoin.
for shuffledhashjoin 1 job:
and 4 stage:
for broadcasthashjoin 3 job:
and 4 stage:
this job's division caused "collect" on driver required when use broadcasthashjoin?
Comments
Post a Comment