cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

azera
by New Contributor II
  • 1623 Views
  • 2 replies
  • 2 kudos

Stream-stream window join after time window aggregation not working in 13.1

Hey,I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.However, when I'm reproducing the following code:clicksWindow = c...

  • 1623 Views
  • 2 replies
  • 2 kudos
Latest Reply
Happyfield7
New Contributor II
  • 2 kudos

Hey,I'm currently facing the same problem, so I would to know if you've made any progress in resolving this issue.

  • 2 kudos
1 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 7315 Views
  • 3 replies
  • 25 kudos

Understanding Joins in PySpark/Databricks In PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you ...

Understanding Joins in PySpark/DatabricksIn PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you to merge data from different sources into a single dataset and potentially perform transformations on...

  • 7315 Views
  • 3 replies
  • 25 kudos
Latest Reply
Meghala
Valued Contributor II
  • 25 kudos

very informative

  • 25 kudos
2 More Replies
DK03
by Contributor
  • 1690 Views
  • 2 replies
  • 2 kudos
  • 1690 Views
  • 2 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

As @Werner Stinckens​ said, it would be ok. But generally decimal column joins are not recommended as other factors come into play like the precision, length etc...Also when you are joining in on decimal columns, be sure to check out the abs value of...

  • 2 kudos
1 More Replies
datatello
by New Contributor II
  • 1784 Views
  • 3 replies
  • 1 kudos

Exponentially slower joins using Pyspark

I'm new to Pyspark, but I've stumbled across an odd issue when I perform joins, where the action seems to take exponentially longer every time I add a new join to a function I'm writing.I'm trying to join a dataset of ~3 million records to one of ~17...

  • 1784 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Lee Bevers​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
Labels