Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am deleting data from curated path based on date column and appending staged data on it on each run, using below script. My fear is, just after the delete operation, if any network issue appeared and the job stopped before it appended the staged da...
I have the following 3 dataframes:I want to append df_forecast to each of df2_CA and df2_USA using a for-loop. However when I run my code, df_forecast is not appending: df2_CA and df2_USA appear exactly as shown above.Here’s the code:df_list=[df2_CA,...
@Jack Homareau Can you try union functionality with dataframes?https://sparkbyexamples.com/pyspark/pyspark-union-and-unionall/and then try to fill NaNs with the desired values?
Use caseRead data from source table using structured spark streaming(Round the clock).Apply transformation logic etc etc and finally merge the dataframe in the target table.If there is any failure during transformation or merge ,databricks job should...
Team, I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark. I am reading a stream from events hub ( Extract) Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED...
Hello, I am developing an application that runs multiple processes that write their results to a common delta table as blind appends. According to the docs I've read online: https://docs.databricks.com/delta/concurrency-control.html#protocolchangedex...
I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?Can you check if both of you actually write the same schema, or remove the mergeschema?