Data Engineering

Forum Posts

Sorted by:

by azera • New Contributor II

06-15-2023 12:59:44 AM

2858 Views
2 replies
2 kudos

Stream-stream window join after time window aggregation not working in 13.1

Hey,I'm trying to perform Time window aggregation in two different streams followed by stream-stream window join described here. I'm running Databricks Runtime 13.1, exactly as advised.However, when I'm reproducing the following code:clicksWindow = c...

Data Engineering

2858 Views
2 replies
2 kudos

06-15-2023 12:59:44 AM

View Replies

Latest Reply

Happyfield7
New Contributor II

10-27-2023 6:05:34 AM

2 kudos

Hey,I'm currently facing the same problem, so I would to know if you've made any progress in resolving this issue.

2 kudos

10-27-2023 6:05:34 AM

1 More Replies

by chanansh • Contributor

01-18-2023 3:08:59 AM

1771 Views
1 replies
0 kudos

stream from azure credentials

I am trying to read stream from azure:(spark.readStream .format("cloudFiles") .option('cloudFiles.clientId', CLIENT_ID) .option('cloudFiles.clientSecret', CLIENT_SECRET) .option('cloudFiles.tenantId', TENTANT_ID) .option("header", "true") .opti...

Data Engineering

1771 Views
1 replies
0 kudos

01-18-2023 3:08:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:53:04 AM

0 kudos

@Hanan Shteingart :It looks like you're using the Azure Blob Storage connector for Spark to read data from Azure. The error message suggests that the credentials you provided are not being used by the connector.To specify the credentials, you can se...

0 kudos

04-10-2023 7:53:04 AM

by Daba • New Contributor III

06-07-2022 9:16:24 AM

8178 Views
3 replies
4 kudos

DLT streaming table and LEFT JOIN

I'm trying to build gold level streaming live table based on two streaming silver live tables with left join.This attempt fails with the next error:"Append mode error: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not suppo...

Data Engineering

8178 Views
3 replies
4 kudos

06-07-2022 9:16:24 AM

View Replies

Latest Reply

Daba
New Contributor III

06-12-2022 1:02:58 AM

4 kudos

Thanks Fatma,I do understand the need for watermarks, but I'm just wondering if this supported by SQL syntax?

4 kudos

06-12-2022 1:02:58 AM

2 More Replies

by NK_123 • New Contributor II

11-22-2022 11:57:28 AM

1611 Views
0 replies
0 kudos

Kaka stream on databricks via SCRAM-SHA-512 mecchanism and SASL_SSL protocal

I ma trying to stream kafka events on databricks but it keeps initializing for hours and don't give any output can someone help what is actually happening and why data is not publishing? I couldn't find anything for this on community.

Data Engineering

1611 Views
0 replies
0 kudos

11-22-2022 11:57:28 AM

by Mado • Valued Contributor II

11-15-2022 3:17:36 AM

2998 Views
3 replies
1 kudos

Resolved! When should I use STREAM() when defining a DLT table?

Hi, I am a little confused when I should use STREAM() when we define a table based on a DLT table. There is a pattern explained in the documentation. CREATE OR REFRESH STREAMING LIVE TABLE streaming_bronze AS SELECT * FROM cloud_files( "s3://p...

Data Engineering

2998 Views
3 replies
1 kudos

11-15-2022 3:17:36 AM

View Replies

Latest Reply

Mado
Valued Contributor II

11-15-2022 4:14:55 PM

1 kudos

Thanks @Landan George Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_idBut, as you can see the "live_gold" is defined by: AS SELECT c...

1 kudos

11-15-2022 4:14:55 PM

2 More Replies

by LearnerShahid • New Contributor II

09-02-2022 1:31:35 AM

8519 Views
6 replies
4 kudos

Resolved! Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

Below function executes fine: def autoload_to_table(data_source, source_format, table_name, checkpoint_directory): query = (spark.readStream .format("cloudFiles") .option("cloudFiles.format", source_format) .option("cloudFile...

I have verified that source data exists.

Data Engineering

8519 Views
6 replies
4 kudos

09-02-2022 1:31:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

09-05-2022 4:48:52 AM

4 kudos

Autoloader is not supported on community edition.

4 kudos

09-05-2022 4:48:52 AM

5 More Replies

by DarshilDesai • New Contributor II

06-16-2020 6:18:09 PM

15214 Views
1 replies
3 kudos

Resolved! How to Efficiently Read Nested JSON in PySpark?

I am having trouble efficiently reading & parsing in a large number of stream files in Pyspark! Context Here is the schema of the stream file that I am reading in JSON. Blank spaces are edits for confidentiality purposes. root |-- location_info: ar...

Data Engineering

15214 Views
1 replies
3 kudos

06-16-2020 6:18:09 PM

View Replies

Latest Reply

Chris_Shehu
Valued Contributor III

02-21-2022 11:00:10 AM

3 kudos

I'm interested in seeing what others have come up with. Currently I'm using Json. normalize() then taking any additional nested statements and using a loop to pull them out -> re-combine them.

3 kudos

02-21-2022 11:00:10 AM

by User16826994223 • Databricks Employee

06-08-2021 5:03:12 AM

1354 Views
1 replies
0 kudos

Stream is not getting started from kafka after 2 hours of cluster statrt

Hi Team I am setting up the Kafka cluster on databricks to ingest the data on delta, but it seems like the cluster is running from last 2 hours but still, the stream is not started and I am not seeing any failure also.

Data Engineering

1354 Views
1 replies
0 kudos

06-08-2021 5:03:12 AM

View Replies

Latest Reply

User16826994223
Databricks Employee

06-08-2021 5:04:13 AM

0 kudos

This Type of issue happens if you have firewall on cloud account and your ip is not whitelisted, so pleaae whitelist the ip and issue will resolve

0 kudos

06-08-2021 5:04:13 AM

Databricks Community

Stream-stream window join after time window aggregation not working in 13.1

stream from azure credentials

DLT streaming table and LEFT JOIN

Kaka stream on databricks via SCRAM-SHA-512 mecchanism and SASL_SSL protocal

Resolved! When should I use STREAM() when defining a DLT table?

Resolved! Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

Resolved! How to Efficiently Read Nested JSON in PySpark?

Stream is not getting started from kafka after 2 hours of cluster statrt