Data Engineering

Forum Posts

Sorted by:

by Daba • New Contributor III

06-07-2022 9:16:24 AM

6919 Views
3 replies
4 kudos

DLT streaming table and LEFT JOIN

I'm trying to build gold level streaming live table based on two streaming silver live tables with left join.This attempt fails with the next error:"Append mode error: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not suppo...

Data Engineering

6919 Views
3 replies
4 kudos

06-07-2022 9:16:24 AM

View Replies

Latest Reply

Daba
New Contributor III

06-12-2022 1:02:58 AM

4 kudos

Thanks Fatma,I do understand the need for watermarks, but I'm just wondering if this supported by SQL syntax?

4 kudos

06-12-2022 1:02:58 AM

2 More Replies

by pramalin • New Contributor

01-30-2023 10:33:47 AM

3238 Views
3 replies
2 kudos

How to perform Inner join using withcolumn

Data Engineering

3238 Views
3 replies
2 kudos

01-30-2023 10:33:47 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

01-31-2023 7:55:15 AM

2 kudos

@prudhvi ramalingam - Please refer to the below example code.import org.apache.spark.sql.functions.expr val person = Seq( (0, "Bill Chambers", 0, Seq(100)), (1, "Matei Zaharia", 1, Seq(500, 250, 100)), (2, "Michael Armbrust", 1, Seq(250,...

2 kudos

01-31-2023 7:55:15 AM

2 More Replies

by Mado • Valued Contributor II

01-10-2023 4:20:55 PM

3823 Views
0 replies
1 kudos

How to get a snapshot of a streaming delta table as a static table?

Hi,Assume that I have a streaming delta table. Is there any way to get snapshot of the streaming table as a static table?Reason is that I need to join this streaming table with a static table by:output = output.join(country_information, ["Country"], ...

Data Engineering

3823 Views
0 replies
1 kudos

01-10-2023 4:20:55 PM

by Carlton • Contributor

10-14-2022 6:45:12 AM

5719 Views
5 replies
14 kudos

I would like to know why CROSS JOIN fails recognize columns

Whenever I apply a CROSS JOIN to my Databricks SQL query I get a message letting me know that a column does not exists, but I'm not sure if the issue is with the CROSS JOIN.For example, the code should identify characters such as http, https, ://, / ...

Data Engineering

5719 Views
5 replies
14 kudos

10-14-2022 6:45:12 AM

View Replies

Latest Reply

Shalabh007
Honored Contributor

11-27-2022 1:35:51 AM

14 kudos

@CARLTON PATTERSON Since you have given an alias "tt" to your table "basecrmcbreport.organizations", to access corresponding columns you will have to access them in format tt.<column_name>in your code in line #4, try accessing the column 'homepage_u...

14 kudos

11-27-2022 1:35:51 AM

4 More Replies

by Anonymous • Not applicable

01-27-2022 2:23:31 AM

29344 Views
4 replies
4 kudos

Resolved! Spark is not able to resolve the columns correctly when joins data frames

Hello all, I m using pyspark ( python 3.8) over spark3.0 on Databricks. When running this DataFrame join:next_df = days_currencies_matrix.alias('a').join( data_to_merge.alias('b') , [ days_currencies_matrix.dt == data_to_merge.RATE_DATE, days...

Data Engineering

29344 Views
4 replies
4 kudos

01-27-2022 2:23:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-27-2022 9:29:44 AM

4 kudos

@Alessio Palma - Howdy! My name is Piper, and I'm a moderator for the community. Would you be happy to mark whichever answer solved your issue so other members may find the solution more quickly?

4 kudos

01-27-2022 9:29:44 AM

3 More Replies

by kruhly • New Contributor II

05-12-2015 3:29:18 AM

35009 Views
12 replies
0 kudos

Resolved! Is there a better method to join two dataframes and not have a duplicated column?

I would like to keep only one of the columns used to join the dataframes. Using select() after the join does not seem straight forward because the real data may have many columns or the column names may not be known. A simple example belowllist = [(...

Data Engineering

35009 Views
12 replies
0 kudos

05-12-2015 3:29:18 AM

View Replies

Latest Reply

TejuNC
New Contributor II

01-23-2017 1:55:52 AM

0 kudos

This is an expected behavior. DataFrame.join method is equivalent to SQL join like thisSELECT*FROM a JOIN b ON joinExprsIf you want to ignore duplicate columns just drop them or select columns of interest afterwards. If you want to disambiguate you c...

0 kudos

01-23-2017 1:55:52 AM

11 More Replies

by ChristianKeller • New Contributor II

10-05-2016 6:10:50 AM

15358 Views
6 replies
0 kudos

Two stage join fails with java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary

Sometimes the error is part of "org.apache.spark.SparkException: Exception thrown in awaitResult:". The error source is the step, where we extract the second time the rows, where the data is updated. We can count the rows, but we cannot display or w...

Data Engineering

15358 Views
6 replies
0 kudos

10-05-2016 6:10:50 AM

View Replies

Latest Reply

activescott
New Contributor III

11-10-2016 9:04:14 PM

0 kudos

Thanks Lleido. I eventually found I had changed the schema of a partitioned DataFrame that I had made inadvertently where I narrowed a column's type from a long to an integer. While rather obvious cause of the problem in hindsight it was terribly di...

0 kudos

11-10-2016 9:04:14 PM

5 More Replies

by vida • Databricks Employee

06-17-2015 6:00:27 PM

14397 Views
8 replies
0 kudos

My Spark SQL join is very slow - what can I do to speed it up?

It's taking 10-12 minutes - can I make it faster?

Data Engineering

14397 Views
8 replies
0 kudos

06-17-2015 6:00:27 PM

View Replies

Latest Reply

vida
Databricks Employee

11-18-2015 10:00:36 AM

0 kudos

Analyze is not needed with parquet tables that use the databricks parquet package. That is the default now when you use .saveAsTable(), but if you use a different output format - it's possible that analyze may not work yet.

0 kudos

11-18-2015 10:00:36 AM

7 More Replies

Databricks Community

DLT streaming table and LEFT JOIN

How to perform Inner join using withcolumn

How to get a snapshot of a streaming delta table as a static table?

I would like to know why CROSS JOIN fails recognize columns

Resolved! Spark is not able to resolve the columns correctly when joins data frames

Resolved! Is there a better method to join two dataframes and not have a duplicated column?

Two stage join fails with java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary

My Spark SQL join is very slow - what can I do to speed it up?