โ04-28-2015 01:03 PM
โ04-28-2015 01:06 PM
You can use HiveQL's cast() type conversion function to cast an element of a nested map in Python as follows:
from pyspark.sql import Row
df = sqlContext.createDataFrame([Row(a={'b': 1})])
str = df.selectExpr("cast(a['b'] AS STRING)")
or in Scala as follows:
val df = Seq((Map("a" -> 1))).toDF("a")
df.selectExpr("cast(a['a'] AS STRING)")
โ02-01-2017 07:57 AM
If your df is registered as a table you can also do this with a SQL call:
df.createOrReplaceTempView("table")
str = spark.sql('''
SELECT CAST(a['b'] AS STRING)
FROM table
''')
Its more code in the simple case but I have found in the past that when this is combined into a much more complex query the SQL format can be more friendly from a readability standpoint.
โ03-15-2017 12:26 PM
Could also use withColumn() to do it without Spark-SQL, although the performance will likely be different. The question being, would creating a new column take more time than using Spark-SQL.
Something like:
val dfNew = df.withColumn("newColName", df.originalColName.cast(IntegerType))
.drop("originalColName").withColumnRenamed("newColName", "originalColName")
Create the new column, casting from the original column, drop the original, then rename the new column back to the original name. A bit roundabout, but looks like it could work.
โ04-19-2018 09:33 PM
Is it safe to cast a column that contains null values?
โ03-19-2020 07:24 AM
I am trying to store a dataframe as table in databricks and encountering the following error, can someone help?
"typeerror: field date: can not merge type <class 'pyspark.sql.types.stringtype'> and <class 'pyspark.sql.types.doubletype'>"
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group