Databricks

cfregly · ‎04-28-2015

cfregly · ‎04-28-2015

You can use HiveQL's cast() type conversion function to cast an element of a nested map in Python as follows:

from pyspark.sql import Row 
df = sqlContext.createDataFrame([Row(a={'b': 1})])
str = df.selectExpr("cast(a['b'] AS STRING)")

or in Scala as follows:

val df = Seq((Map("a" -> 1))).toDF("a") 
df.selectExpr("cast(a['a'] AS STRING)")

Grr · ‎02-01-2017

If your df is registered as a table you can also do this with a SQL call:

df.createOrReplaceTempView("table")
str = spark.sql('''
    SELECT CAST(a['b'] AS STRING)
    FROM table
''')

Its more code in the simple case but I have found in the past that when this is combined into a much more complex query the SQL format can be more friendly from a readability standpoint.

DarrellUlm · ‎03-15-2017

Could also use withColumn() to do it without Spark-SQL, although the performance will likely be different. The question being, would creating a new column take more time than using Spark-SQL.

Something like:

val dfNew = df.withColumn("newColName", df.originalColName.cast(IntegerType))
    .drop("originalColName").withColumnRenamed("newColName", "originalColName")

Create the new column, casting from the original column, drop the original, then rename the new column back to the original name. A bit roundabout, but looks like it could work.