scala - unable to view data of hive tables after update in spark -
case: have table hivetest orc table , transaction set true , loaded in spark shell , viewed data
var rdd= objhivecontext.sql("select * hivetest") rdd.show()
--- able view data
now went hive shell or ambari updated table , example
hive> update hivetest set name='test' ---done , success hive> select * hivetest -- able view updated data
now when can come spark , run cannot view data except column names
scala>var rdd1= objhivecontext.sql("select * hivetest") scala> rdd1.show()
--this time columns printed , data not coming
issue 2: unable update spark sql when run scal>objhivecontext.sql("update hivetest set name='test'") getting below error
org.apache.spark.sql.analysisexception: unsupported language features in query: insert hivetest values(1,'sudhir','software',1,'it') tok_query 0, 0,17, 0 tok_from 0, -1,17, 0 tok_virtual_table 0, -1,17, 0 tok_virtual_tabref 0, -1,-1, 0 tok_anonymous 0, -1,-1, 0 tok_values_table 1, 6,17, 28 tok_value_row 1, 7,17, 28 1 1, 8,8, 28 'sudhir' 1, 10,10, 30 'software' 1, 12,12, 39 1 1, 14,14, 50 'it' 1, 16,16, 52 tok_insert 1, 0,-1, 12 tok_insert_into 1, 0,4, 12 tok_tab 1, 4,4, 12 tok_tabname 1, 4,4, 12 hivetest 1, 4,4, 12 tok_select 0, -1,-1, 0 tok_selexpr 0, -1,-1, 0 tok_allcolref 0, -1,-1, 0 scala.notimplementederror: no parse rules for: tok_virtual_table 0, -1,17, 0 tok_virtual_tabref 0, -1,-1, 0 tok_anonymous 0, -1,-1, 0 tok_values_table 1, 6,17, 28 tok_value_row 1, 7,17, 28 1 1, 8,8, 28 'sudhir' 1, 10,10, 30 'software' 1, 12,12, 39 1 1, 14,14, 50 'it' 1, 16,16, 52 org.apache.spark.sql.hive.hiveql$.nodetorelation(hiveql.scala:1235)
this error insert statement same sort of error update statement also.
have tried objhivecontext.refreshtable("hivetest")?
spark sql aggressively caches hive metastore data.
if update happens outside of spark sql, might experience unexpected results spark sql's version of hive metastore out of date.
here's more info:
http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.hivecontext
the docs mention parquet, applies orc , other file formats.
with json, example, if add new files directory outside of spark sql, you'll need call hivecontext.refreshtable() within spark sql see new data.
Comments
Post a Comment