Data Engineering

Forum Posts

Sorted by:

by azera • New Contributor II

06-15-2023 12:59:44 AM

2866 Views
2 replies
2 kudos

Stream-stream window join after time window aggregation not working in 13.1

Hey,I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.However, when I'm reproducing the following code:clicksWindow = c...

Data Engineering

2866 Views
2 replies
2 kudos

06-15-2023 12:59:44 AM

View Replies

Latest Reply

Happyfield7
New Contributor II

10-27-2023 6:05:34 AM

2 kudos

Hey,I'm currently facing the same problem, so I would to know if you've made any progress in resolving this issue.

2 kudos

10-27-2023 6:05:34 AM

1 More Replies

by Aviral-Bhardwaj • Esteemed Contributor III

12-24-2022 6:53:48 AM

14673 Views
3 replies
25 kudos

Understanding Joins in PySpark/Databricks In PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you ...

Understanding Joins in PySpark/DatabricksIn PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you to merge data from different sources into a single dataset and potentially perform transformations on...

Data Engineering

14673 Views
3 replies
25 kudos

12-24-2022 6:53:48 AM

View Replies

Latest Reply

Meghala
Valued Contributor II

12-26-2022 2:13:31 AM

25 kudos

very informative

25 kudos

12-26-2022 2:13:31 AM

2 More Replies

by DK03 • Contributor

11-30-2022 5:13:04 AM

3322 Views
2 replies
2 kudos

Is it ok to join on the decimal type fields? How does it affect the performance?

Data Engineering

3322 Views
2 replies
2 kudos

11-30-2022 5:13:04 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 10:02:46 AM

2 kudos

As @Werner Stinckens said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc...Also when you are joining in on decimal columns, be sure to check out the abs value of...

2 kudos

11-30-2022 10:02:46 AM

1 More Replies

by datatello • New Contributor II

07-29-2022 11:03:08 PM

3319 Views
3 replies
1 kudos

Exponentially slower joins using Pyspark

I'm new to Pyspark, but I've stumbled across an odd issue when I perform joins, where the action seems to take exponentially longer every time I add a new join to a function I'm writing.I'm trying to join a dataset of ~3 million records to one of ~17...

Data Engineering

3319 Views
3 replies
1 kudos

07-29-2022 11:03:08 PM

View Replies

Latest Reply

Vidula
Honored Contributor

09-06-2022 5:45:42 AM

1 kudos

Hi @Lee Bevers Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

1 kudos

09-06-2022 5:45:42 AM

2 More Replies

Databricks Community

Stream-stream window join after time window aggregation not working in 13.1

Understanding Joins in PySpark/Databricks In PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you ...

Is it ok to join on the decimal type fields? How does it affect the performance?

Exponentially slower joins using Pyspark