How to load specific Hive partition in DataFrame Spark 1.6? -
spark 1.6 onwards per official doc cannot add specific hive partitions dataframe
till spark 1.5 following used work , dataframe have entity column , data, shown below -
dataframe df = hivecontext.read().format("orc").load("path/to/table/entity=xyz")
however, not work in spark 1.6.
if give base path following not contain entity column want in dataframe, shown below -
dataframe df = hivecontext.read().format("orc").load("path/to/table/")
how load specific hive partition in dataframe? driver behind removing feature?
i believe efficient. there alternative achive in spark 1.6 ?
as per understanding, spark 1.6 loads partitions , if filter specific partitions not efficient, hits memory , throws gc(garbage collection) errors because of thousands of partitions loaded memory , not specific partition.
please guide. in advance.
to add specific partition in dataframe using spark 1.6 have following first set basepath
, give path of partition needs loaded
dataframe df = hivecontext.read().format("orc"). option("basepath", "path/to/table/"). load("path/to/table/entity=xyz")
so above code load specific partition in dataframe.
Comments
Post a Comment