- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 06:17 AM
I have a statement like this with pyspark:
target_tbl.alias("target")\
.merge(stage_df.hint("broadcast").alias("source"), merge_join_expr)\
.whenMatchedUpdateAll()\
.whenNotMatchedInsertAll()\
.whenNotMatchedBySourceDelete()\
.execute()
It is not accepting the broadcast hint. I am getting the following:
Join hint ignored: This query has a join hint '(strategy=broadcast)' that is not associated with any join operator and will thus be ignored. Investigate the query to see if the hint is placed correctly.
In this video:
https://www.youtube.com/watch?v=o2k9PICWdx0&t=797s
it is said that this approach is working
- Labels:
-
BroadcastJoin
-
Delta
-
Pyspark
Accepted Solutions
data:image/s3,"s3://crabby-images/42b93/42b9345c42b8f7964a094a99d153a8dc1c5eb2fb" alt=""
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2023 09:04 AM
@Nikolay Chalkanov :
The error message indicates that the join hint is not associated with any join operator. This can happen if the hint is not placed correctly, or if there is no join operator in the query that can take advantage of the hint.
In the code you posted, it seems that you are using the merge method to perform the join operation. According to the Delta Lake documentation, the merge method does not support broadcast hints:
https://docs.delta.io/latest/delta-update.html#joins
Therefore, it is expected that the hint is being ignored.
The video you mentioned might be using a different method to perform the join operation, which supports broadcast hints. You could try using the join method instead of merge, and see if the broadcast hint is accepted:
target_tbl.alias("target")\
.join(stage_df.hint("broadcast").alias("source"), join_expr, "inner")\
.write.format("delta").mode("overwrite").save("output_delta")
Keep in mind that using broadcast hints can have performance implications, and it might not always be the best option depending on the size of your data and the resources available.
data:image/s3,"s3://crabby-images/42b93/42b9345c42b8f7964a094a99d153a8dc1c5eb2fb" alt=""
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2023 09:04 AM
@Nikolay Chalkanov :
The error message indicates that the join hint is not associated with any join operator. This can happen if the hint is not placed correctly, or if there is no join operator in the query that can take advantage of the hint.
In the code you posted, it seems that you are using the merge method to perform the join operation. According to the Delta Lake documentation, the merge method does not support broadcast hints:
https://docs.delta.io/latest/delta-update.html#joins
Therefore, it is expected that the hint is being ignored.
The video you mentioned might be using a different method to perform the join operation, which supports broadcast hints. You could try using the join method instead of merge, and see if the broadcast hint is accepted:
target_tbl.alias("target")\
.join(stage_df.hint("broadcast").alias("source"), join_expr, "inner")\
.write.format("delta").mode("overwrite").save("output_delta")
Keep in mind that using broadcast hints can have performance implications, and it might not always be the best option depending on the size of your data and the resources available.
data:image/s3,"s3://crabby-images/42b93/42b9345c42b8f7964a094a99d153a8dc1c5eb2fb" alt=""
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2023 06:06 PM
Hi @Nikolay Chalkanov
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
data:image/s3,"s3://crabby-images/2345c/2345ca6ff2e34b0d370ce03453929e5fd0c4a88d" alt=""
data:image/s3,"s3://crabby-images/2345c/2345ca6ff2e34b0d370ce03453929e5fd0c4a88d" alt=""