How to load specific Hive partition in DataFrame Spark 1.6? -


spark 1.6 onwards per official doc cannot add specific hive partitions dataframe

till spark 1.5 following used work , dataframe have entity column , data, shown below -

dataframe df = hivecontext.read().format("orc").load("path/to/table/entity=xyz") 

however, not work in spark 1.6.

if give base path following not contain entity column want in dataframe, shown below -

dataframe df = hivecontext.read().format("orc").load("path/to/table/")  

how load specific hive partition in dataframe? driver behind removing feature?

i believe efficient. there alternative achive in spark 1.6 ?

as per understanding, spark 1.6 loads partitions , if filter specific partitions not efficient, hits memory , throws gc(garbage collection) errors because of thousands of partitions loaded memory , not specific partition.

please guide. in advance.

to add specific partition in dataframe using spark 1.6 have following first set basepath , give path of partition needs loaded

dataframe df = hivecontext.read().format("orc").                option("basepath", "path/to/table/").                load("path/to/table/entity=xyz") 

so above code load specific partition in dataframe.


Comments

Popular posts from this blog

Capture and play voice with Asterisk ARI -

java - Why database contraints in HSQLDB are only checked during a commit when using transactions in Hibernate? -

visual studio - Installing Packages through Nuget - "Central Directory corrupt" -