โ04-28-2015 01:03 PM
โ04-28-2015 01:06 PM
You can use HiveQL's cast() type conversion function to cast an element of a nested map in Python as follows:
from pyspark.sql import Row
df = sqlContext.createDataFrame([Row(a={'b': 1})])
str = df.selectExpr("cast(a['b'] AS STRING)")
or in Scala as follows:
val df = Seq((Map("a" -> 1))).toDF("a")
df.selectExpr("cast(a['a'] AS STRING)")
โ02-01-2017 07:57 AM
If your df is registered as a table you can also do this with a SQL call:
df.createOrReplaceTempView("table")
str = spark.sql('''
SELECT CAST(a['b'] AS STRING)
FROM table
''')
Its more code in the simple case but I have found in the past that when this is combined into a much more complex query the SQL format can be more friendly from a readability standpoint.
โ03-15-2017 12:26 PM
Could also use withColumn() to do it without Spark-SQL, although the performance will likely be different. The question being, would creating a new column take more time than using Spark-SQL.
Something like:
val dfNew = df.withColumn("newColName", df.originalColName.cast(IntegerType))
.drop("originalColName").withColumnRenamed("newColName", "originalColName")
Create the new column, casting from the original column, drop the original, then rename the new column back to the original name. A bit roundabout, but looks like it could work.
โ04-19-2018 09:33 PM
Is it safe to cast a column that contains null values?
โ03-19-2020 07:24 AM
I am trying to store a dataframe as table in databricks and encountering the following error, can someone help?
"typeerror: field date: can not merge type <class 'pyspark.sql.types.stringtype'> and <class 'pyspark.sql.types.doubletype'>"
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.