03-18-2022 10:24 AM
What I am doing:
spark_df = spark.createDataFrame(dfnew)
spark_df.write.saveAsTable("default.test_table", index=False, header=True)
This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detects wrong? Mostly concerned about doubles, ints, bigints.
I tested casting but it doesnt work on databricks:
spark_df = spark.createDataFrame(dfnew.select(dfnew("Year").cast(IntegerType).as("Year")))
Is there a way to feed a DDL to spark dataframe for databricks? Should I not use spark to create the table?
03-20-2022 07:47 AM
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import *
dfnew = spark.createDataFrame([("2022",), ("2021",), ("2020",)], ["Year"])
dfnew = dfnew.withColumn("Year", col("Year").cast(IntegerType()))
03-20-2022 07:45 AM
Hi @Raie A : It will create data frame with wider dataTypes for example Long for (Int/BigInt) etc.. It depends on the use case. If you have created a table and want to overwrite/append data to that then you need to explicitly cast as per your column DataType.
one option is by creating a data class and convert DF as DS
case class person(id: Int, name: String)
dfnew.as[person]
03-20-2022 07:47 AM
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import *
dfnew = spark.createDataFrame([("2022",), ("2021",), ("2020",)], ["Year"])
dfnew = dfnew.withColumn("Year", col("Year").cast(IntegerType()))
03-21-2022 10:23 AM
Thanks @Hubert Dudek I changed the syntax and imported all the data types and casting is working!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now