Data Engineering

Forum Posts

Sorted by:

by RateVan • New Contributor II

04-01-2023 4:31:49 AM

2987 Views
4 replies
0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

Data Engineering

2987 Views
4 replies
0 kudos

04-01-2023 4:31:49 AM

View Replies

Latest Reply

Dtank
New Contributor II

12-05-2024 1:00:52 AM

0 kudos

Do you have any solution for this ?

0 kudos

12-05-2024 1:00:52 AM

3 More Replies

by avnish26 • New Contributor III

08-23-2022 4:40:36 AM

12095 Views
5 replies
9 kudos

Spark 3.3.0 connect kafka problem

I am trying to connect to my Kafka from spark but getting an error:Kafka Version: 2.4.1Spark Version: 3.3.0I am using jupyter notebook to execute the pyspark code below:```from pyspark.sql.functions import *from pyspark.sql.types import *#import libr...

Data Engineering

12095 Views
5 replies
9 kudos

08-23-2022 4:40:36 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-03-2024 2:23:22 PM

9 kudos

Hi @avnish26, did you added the Jar files to the cluster? do you still have issues? please let us know

9 kudos

01-03-2024 2:23:22 PM

4 More Replies

by swetha • New Contributor III

08-30-2022 10:24:49 AM

3357 Views
4 replies
1 kudos

Error: no streaming listener attached to the spark app is the error we are observing post accessing streaming statistics API. Please help us with this issue ASAP. Thanks.

Issue: Spark structured streaming applicationAfter adding the listener jar file in the cluster init script, the listener is working (From what I see in the stdout/log4j logs)But when I try to hit the 'Content-Type: application/json' http://host:port/...

Data Engineering

3357 Views
4 replies
1 kudos

08-30-2022 10:24:49 AM

View Replies

Latest Reply

INJUSTIC
New Contributor II

11-20-2024 7:07:25 AM

1 kudos

Have you found the solution? Thanks

1 kudos

11-20-2024 7:07:25 AM

3 More Replies

by swetha • New Contributor III

08-30-2022 4:42:29 AM

2975 Views
3 replies
1 kudos

I am unable to attach a streaming listener to a spark streaming job. Error: no streaming listener attached to the spark application is the error we are observing post accessing streaming statistics API. Please help us with this issue ASAP. Thanks.

Issue:After adding the listener jar file in the cluster init script, the listener is working (From what I see in the stdout/log4j logs)But when I try to hit the 'Content-Type: application/json' http://host:port/api/v1/applications/app-id/streaming/st...

Data Engineering

2975 Views
3 replies
1 kudos

08-30-2022 4:42:29 AM

View Replies

Latest Reply

INJUSTIC
New Contributor II

11-20-2024 7:05:18 AM

1 kudos

Have you found the solution? Thanks

1 kudos

11-20-2024 7:05:18 AM

2 More Replies

by Data_Engineer3 • Contributor III

04-02-2023 9:20:18 AM

3459 Views
5 replies
0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL] #[Spark streaming] #[Spark structured streaming] #Spark

Data Engineering

3459 Views
5 replies
0 kudos

04-02-2023 9:20:18 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

10-31-2024 3:02:59 AM

0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html Also, what is the challenge while using foreachbatch?

0 kudos

10-31-2024 3:02:59 AM

4 More Replies

by CarterM • New Contributor III

09-29-2022 4:46:56 PM

7031 Views
3 replies
2 kudos

Resolved! Why Spark Streaming from S3 is returning thousands of files when there are only 9?

I am attempting to stream JSON endpoint responses from an s3 bucket into a spark DLT. I have been very successful in this practice previously, but the difference this time is that I am storing the responses from multiple endpoints in the same s3 buck...

Data Engineering

7031 Views
3 replies
2 kudos

09-29-2022 4:46:56 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-03-2022 3:04:39 PM

2 kudos

@Carter Mooring Thank you SO MUCH for coming back to provide a solution to your thread! Happy you were able to figure this out so quickly. And I am sure that this will help someone in the future with the same issue.

2 kudos

10-03-2022 3:04:39 PM

2 More Replies

by YFL • New Contributor III

12-09-2021 12:45:43 PM

7297 Views
11 replies
6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

Data Engineering

7297 Views
11 replies
6 kudos

12-09-2021 12:45:43 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-12-2022 6:44:35 AM

6 kudos

Hey @Yerachmiel Feltzman I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

6 kudos

05-12-2022 6:44:35 AM

10 More Replies

by SamarthJain • New Contributor II

11-24-2021 1:05:05 AM

6785 Views
4 replies
2 kudos

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your help here to understand what happens internally at the "Stream Initializing" phase of the Spark Streaming job tha...

Data Engineering

6785 Views
4 replies
2 kudos

11-24-2021 1:05:05 AM

View Replies

Latest Reply

MohsenJ
Contributor

03-27-2024 1:47:15 AM

2 kudos

I'm facing the same issue when I try to run this example Create a monitor using the API | Databricks on AWS (Inference Lakehouse Monitor regression example notebook). any idea?

2 kudos

03-27-2024 1:47:15 AM

3 More Replies

by lnights • New Contributor II

02-08-2023 2:12:28 PM

5329 Views
5 replies
2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

Data Engineering

5329 Views
5 replies
2 kudos

02-08-2023 2:12:28 PM

View Replies

Latest Reply

PetePP
New Contributor II

08-31-2023 7:02:55 AM

2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

2 kudos

08-31-2023 7:02:55 AM

4 More Replies

by sanjay • Valued Contributor II

02-10-2023 9:43:46 AM

10051 Views
8 replies
3 kudos

How to stop continuous running streaming job over weekend

I have an continuous running streaming Job, I would like to stop this over weekend and start again on Monday. Here is my streaming job code.(spark.readStream.format("delta").load(input_path).writeStream.option("checkpointLocation", input_checkpoint_p...

Data Engineering

10051 Views
8 replies
3 kudos

02-10-2023 9:43:46 AM

View Replies

Latest Reply

NDK
New Contributor II

08-23-2023 10:41:28 AM

3 kudos

@sanjay Any luck on that, I am also looking for the solution for the same issue

3 kudos

08-23-2023 10:41:28 AM

7 More Replies

by sparkstreaming • New Contributor III

12-22-2021 6:53:34 AM

6438 Views
5 replies
4 kudos

Resolved! Missing rows while processing records using foreachbatch in spark structured streaming from Azure Event Hub

I am new to real time scenarios and I need to create a spark structured streaming jobs in databricks. I am trying to apply some rule based validations from backend configurations on each incoming JSON message. I need to do the following actions on th...

Data Engineering

6438 Views
5 replies
4 kudos

12-22-2021 6:53:34 AM

View Replies

Latest Reply

Rishi045
New Contributor III

08-03-2023 3:33:51 AM

4 kudos

Were you able to achieve any solutions if yes please can you help with it.

4 kudos

08-03-2023 3:33:51 AM

4 More Replies

by Ryan_Chynoweth • Esteemed Contributor

05-12-2023 2:31:53 PM

2723 Views
2 replies
2 kudos

medium.com

Hi All, I recently published a streaming data comparison between Snowflake and Databricks. Hope you enjoy! Please let me know what you think! https://medium.com/@24chynoweth/data-streaming-at-scale-databricks-and-snowflake-ca65a2401649

Data Engineering

2723 Views
2 replies
2 kudos

05-12-2023 2:31:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-02-2023 9:56:06 AM

2 kudos

Nicely done.

2 kudos

07-02-2023 9:56:06 AM

1 More Replies

by gg_047320_gg_94 • New Contributor II

05-27-2023 9:09:48 PM

8306 Views
1 replies
1 kudos

DLT Spark readstream fails on the source table which is overwritten

I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons. df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('starting...

Data Engineering

8306 Views
1 replies
1 kudos

05-27-2023 9:09:48 PM

View Replies

Latest Reply

Debayan
Databricks Employee

06-05-2023 12:31:43 AM

1 kudos

Hi, Could you please confirm DLT and DBR versions? Also please tag @Debayan with your next response which will notify me, Thank you!

1 kudos

06-05-2023 12:31:43 AM

by tech2cloud • New Contributor II

02-11-2023 6:22:21 AM

3120 Views
3 replies
2 kudos

Resolved! Databricks Autoloader simply does not return results if the schema provided has something wrong. Whereas it should provide some error. It Simply does not returns any rows making it difficult to debug.

Data Engineering

3120 Views
3 replies
2 kudos

02-11-2023 6:22:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2023 2:46:18 AM

2 kudos

Hi @Ravi Vishwakarma Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

2 kudos

04-12-2023 2:46:18 AM

2 More Replies

by chanansh • Contributor

01-18-2023 3:08:59 AM

1342 Views
1 replies
0 kudos

stream from azure credentials

I am trying to read stream from azure:(spark.readStream .format("cloudFiles") .option('cloudFiles.clientId', CLIENT_ID) .option('cloudFiles.clientSecret', CLIENT_SECRET) .option('cloudFiles.tenantId', TENTANT_ID) .option("header", "true") .opti...

Data Engineering

1342 Views
1 replies
0 kudos

01-18-2023 3:08:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:53:04 AM

0 kudos

@Hanan Shteingart :It looks like you're using the Azure Blob Storage connector for Spark to read data from Azure. The error message suggests that the credentials you provided are not being used by the connector.To specify the credentials, you can se...

0 kudos

04-10-2023 7:53:04 AM