Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...
I have a statement like this with pyspark:target_tbl.alias("target")\ .merge(stage_df.hint("broadcast").alias("source"), merge_join_expr)\ .whenMatchedUpdateAll()\ .whenNotMatchedInsertAll()\ .w...
Hi @Nikolay Chalkanov Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...
I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession
...
Yes, there are a couple limitation. Please find below the details:> It will not perform broadcast join if the table has 512 million or more rows > It will not perform broadcast join if the table is larger than 8GB