cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MikeK_
by New Contributor II
  • 26229 Views
  • 6 replies
  • 0 kudos

Resolved! SQL Update Join

Hi, I'm importing some data and stored procedures from SQL Server into databricks, I noticed that updates with joins are not supported in Spark SQL, what's the alternative I can use? Here's what I'm trying to do: update t1 set t1.colB=CASE WHEN t2.c...

  • 26229 Views
  • 6 replies
  • 0 kudos
Latest Reply
LyderIversen
New Contributor II
  • 0 kudos

Hi! This is way late, but did you ever find a solution to the CROSS APPLY-part of your question? Is it possible to do CROSS APPLY in Spark SQL, or is there something you can use instead?

  • 0 kudos
5 More Replies
Daba
by New Contributor III
  • 4917 Views
  • 5 replies
  • 5 kudos

DLT streaming table and LEFT JOIN

I'm trying to build gold level streaming live table based on two streaming silver live tables with left join.This attempt fails with the next error:"Append mode error: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not suppo...

  • 4917 Views
  • 5 replies
  • 5 kudos
Latest Reply
Daba
New Contributor III
  • 5 kudos

Thanks Fatma,I do understand the need for watermarks, but I'm just wondering if this supported by SQL syntax?

  • 5 kudos
4 More Replies
pramalin
by New Contributor
  • 2128 Views
  • 3 replies
  • 2 kudos
  • 2128 Views
  • 3 replies
  • 2 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 2 kudos

@prudhvi ramalingam​ - Please refer to the below example code.import org.apache.spark.sql.functions.expr val person = Seq( (0, "Bill Chambers", 0, Seq(100)), (1, "Matei Zaharia", 1, Seq(500, 250, 100)), (2, "Michael Armbrust", 1, Seq(250,...

  • 2 kudos
2 More Replies
Mado
by Valued Contributor II
  • 2471 Views
  • 1 replies
  • 2 kudos

Resolved! How to get a snapshot of a streaming delta table as a static table?

Hi,Assume that I have a streaming delta table. Is there any way to get snapshot of the streaming table as a static table?Reason is that I need to join this streaming table with a static table by:output = output.join(country_information, ["Country"], ...

  • 2471 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Mohammad Saber​, Yes, you can try this approach Create the snapshot with a timestampsnapshot_time = "2022-10-01 00:00:00"   spark.sql(f"CREATE TABLE snapshot_table_at_time AS SELECT * FROM streaming_table VERSION AS OF '{snapshot_time}'")Then, yo...

  • 2 kudos
Carlton
by New Contributor III
  • 3360 Views
  • 5 replies
  • 14 kudos

I would like to know why CROSS JOIN fails recognize columns

Whenever I apply a CROSS JOIN to my Databricks SQL query I get a message letting me know that a column does not exists, but I'm not sure if the issue is with the CROSS JOIN.For example, the code should identify characters such as http, https, ://, / ...

image
  • 3360 Views
  • 5 replies
  • 14 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 14 kudos

@CARLTON PATTERSON​ Since you have given an alias "tt" to your table "basecrmcbreport.organizations", to access corresponding columns you will have to access them in format tt.<column_name>in your code in line #4, try accessing the column 'homepage_u...

  • 14 kudos
4 More Replies
Anonymous
by Not applicable
  • 16699 Views
  • 4 replies
  • 4 kudos

Resolved! Spark is not able to resolve the columns correctly when joins data frames

Hello all, I m using pyspark ( python 3.8) over spark3.0 on Databricks. When running this DataFrame join:next_df = days_currencies_matrix.alias('a').join( data_to_merge.alias('b') , [ days_currencies_matrix.dt == data_to_merge.RATE_DATE, days...

  • 16699 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Alessio Palma​ - Howdy! My name is Piper, and I'm a moderator for the community. Would you be happy to mark whichever answer solved your issue so other members may find the solution more quickly?

  • 4 kudos
3 More Replies
kruhly
by New Contributor II
  • 29511 Views
  • 12 replies
  • 0 kudos

Resolved! Is there a better method to join two dataframes and not have a duplicated column?

I would like to keep only one of the columns used to join the dataframes. Using select() after the join does not seem straight forward because the real data may have many columns or the column names may not be known. A simple example belowllist = [(...

  • 29511 Views
  • 12 replies
  • 0 kudos
Latest Reply
TejuNC
New Contributor II
  • 0 kudos

This is an expected behavior. DataFrame.join method is equivalent to SQL join like thisSELECT*FROM a JOIN b ON joinExprsIf you want to ignore duplicate columns just drop them or select columns of interest afterwards. If you want to disambiguate you c...

  • 0 kudos
11 More Replies
ChristianKeller
by New Contributor II
  • 13242 Views
  • 6 replies
  • 0 kudos

Two stage join fails with java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary

Sometimes the error is part of "org.apache.spark.SparkException: Exception thrown in awaitResult:". The error source is the step, where we extract the second time the rows, where the data is updated. We can count the rows, but we cannot display or w...

  • 13242 Views
  • 6 replies
  • 0 kudos
Latest Reply
activescott
New Contributor III
  • 0 kudos

Thanks Lleido. I eventually found I had changed the schema of a partitioned DataFrame that I had made inadvertently where I narrowed a column's type from a long to an integer. While rather obvious cause of the problem in hindsight it was terribly di...

  • 0 kudos
5 More Replies
vida
by Contributor II
  • 10499 Views
  • 8 replies
  • 0 kudos

My Spark SQL join is very slow - what can I do to speed it up?

It's taking 10-12 minutes - can I make it faster?

  • 10499 Views
  • 8 replies
  • 0 kudos
Latest Reply
vida
Contributor II
  • 0 kudos

Analyze is not needed with parquet tables that use the databricks parquet package. That is the default now when you use .saveAsTable(), but if you use a different output format - it's possible that analyze may not work yet.

  • 0 kudos
7 More Replies
Labels