Convert null values to empty array in Spark DataFrame -
i have spark data frame 1 column array of integers. column nullable because coming left outer join. want convert null values empty array don't have deal nulls later.
i thought so:
val mycol = df("mycol") df.withcolumn( "mycol", when(mycol.isnull, array[int]()).otherwise(mycol) )
however, results in following exception:
java.lang.runtimeexception: unsupported literal type class [i [i@5ed25612 @ org.apache.spark.sql.catalyst.expressions.literal$.apply(literals.scala:49) @ org.apache.spark.sql.functions$.lit(functions.scala:89) @ org.apache.spark.sql.functions$.when(functions.scala:778)
apparently array types not supported when
function. there other easy way convert null values?
in case relevant, here schema column:
|-- mycol: array (nullable = true) | |-- element: integer (containsnull = false)
you can use udf:
import org.apache.spark.sql.functions.udf val array_ = udf(() => array.empty[int])
combined when
or coalesce
:
df.withcolumn("mycol", when(mycol.isnull, array_()).otherwise(mycol)) df.withcolumn("mycol", coalesce(mycol, array_())).show
for array literals can use array
function:
import org.apache.spark.sql.functions.{array, lit} df.withcolumn("foo", array(lit(1), lit(2)))
but unfortunately won't work here since cannot specify type.
Comments
Post a Comment