Convert null values to empty array in Spark DataFrame -


i have spark data frame 1 column array of integers. column nullable because coming left outer join. want convert null values empty array don't have deal nulls later.

i thought so:

val mycol = df("mycol") df.withcolumn( "mycol", when(mycol.isnull, array[int]()).otherwise(mycol) ) 

however, results in following exception:

java.lang.runtimeexception: unsupported literal type class [i [i@5ed25612 @ org.apache.spark.sql.catalyst.expressions.literal$.apply(literals.scala:49) @ org.apache.spark.sql.functions$.lit(functions.scala:89) @ org.apache.spark.sql.functions$.when(functions.scala:778) 

apparently array types not supported when function. there other easy way convert null values?

in case relevant, here schema column:

|-- mycol: array (nullable = true) |    |-- element: integer (containsnull = false) 

you can use udf:

import org.apache.spark.sql.functions.udf  val array_ = udf(() => array.empty[int]) 

combined when or coalesce:

df.withcolumn("mycol", when(mycol.isnull, array_()).otherwise(mycol)) df.withcolumn("mycol", coalesce(mycol, array_())).show 

for array literals can use array function:

import org.apache.spark.sql.functions.{array, lit}  df.withcolumn("foo", array(lit(1), lit(2))) 

but unfortunately won't work here since cannot specify type.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -