Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a date column that comes with month-year format and I am trying to convert that into dd-mm-yyyy format in pyspark for example I have date column with value Jan-2019Feb-2020Mar-2020the output I am expecting is 01/01/201901/02/202001/03/2020here...
Hi @vikram sinhha We haven't heard from you since the last response from @Suteja Kanuri . Kindly share the information with us, and in return, we will provide you with the necessary solution. Thanks and Regards
I am deleting data from curated path based on date column and appending staged data on it on each run, using below script. My fear is, just after the delete operation, if any network issue appeared and the job stopped before it appended the staged da...
I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...
Hi @Vladimir Ryabtsev ,Because you are creating a delta table, I think that you are seeing a performance improvement because of Dynamic Partition pruning, According to the documentation, "Partition pruning can take place at query compilation time wh...
I'm having some issues with creating a dataframe with a date column. Could I know what is wrong?from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
from pyspark.sql.types import DateType, FloatType
spark = SparkSession.bui...
Hi @Kaniz Fatma,I actually changed the date format to 'M/d/Y' and it didn't throw any errors. I found in my csv file that it had dates like '3/1/2022'. Could that be the issue? But some dates also were like '12/1/2022. So I'm kind of confused.