How to load specific Hive partition in DataFrame Spark 1.6? -


spark 1.6 onwards per official doc cannot add specific hive partitions dataframe

till spark 1.5 following used work , dataframe have entity column , data, shown below -

dataframe df = hivecontext.read().format("orc").load("path/to/table/entity=xyz") 

however, not work in spark 1.6.

if give base path following not contain entity column want in dataframe, shown below -

dataframe df = hivecontext.read().format("orc").load("path/to/table/")  

how load specific hive partition in dataframe? driver behind removing feature?

i believe efficient. there alternative achive in spark 1.6 ?

as per understanding, spark 1.6 loads partitions , if filter specific partitions not efficient, hits memory , throws gc(garbage collection) errors because of thousands of partitions loaded memory , not specific partition.

please guide. in advance.

to add specific partition in dataframe using spark 1.6 have following first set basepath , give path of partition needs loaded

dataframe df = hivecontext.read().format("orc").                option("basepath", "path/to/table/").                load("path/to/table/entity=xyz") 

so above code load specific partition in dataframe.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -