How to load specific Hive partition in DataFrame Spark 1.6? -

- February 15, 2013

spark 1.6 onwards per official doc cannot add specific hive partitions dataframe

till spark 1.5 following used work , dataframe have entity column , data, shown below -

dataframe df = hivecontext.read().format("orc").load("path/to/table/entity=xyz")

however, not work in spark 1.6.

if give base path following not contain entity column want in dataframe, shown below -

dataframe df = hivecontext.read().format("orc").load("path/to/table/")

how load specific hive partition in dataframe? driver behind removing feature?

i believe efficient. there alternative achive in spark 1.6 ?

as per understanding, spark 1.6 loads partitions , if filter specific partitions not efficient, hits memory , throws gc(garbage collection) errors because of thousands of partitions loaded memory , not specific partition.

please guide. in advance.

to add specific partition in dataframe using spark 1.6 have following first set basepath , give path of partition needs loaded

dataframe df = hivecontext.read().format("orc").                option("basepath", "path/to/table/").                load("path/to/table/entity=xyz")

so above code load specific partition in dataframe.

Search This Blog

Stadnd

How to load specific Hive partition in DataFrame Spark 1.6? -

Comments

Post a Comment

Popular posts from this blog

Capture and play voice with Asterisk ARI -

visual studio - Installing Packages through Nuget - "Central Directory corrupt" -

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -