Topics with Label: Spark structured streaming

Forum Posts

Sorted by:

by RateVan • New Contributor II

04-01-2023 4:31:49 AM

4621 Views
4 replies
0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

Data Engineering

4621 Views
4 replies
0 kudos

04-01-2023 4:31:49 AM

View Replies

Latest Reply

Dtank
New Contributor II

12-05-2024 1:00:52 AM

0 kudos

Do you have any solution for this ?

0 kudos

12-05-2024 1:00:52 AM

3 More Replies

by swetha • New Contributor III

08-30-2022 10:24:49 AM

4469 Views
4 replies
1 kudos

Error: no streaming listener attached to the spark app is the error we are observing post accessing streaming statistics API. Please help us with this issue ASAP. Thanks.

Issue: Spark structured streaming applicationAfter adding the listener jar file in the cluster init script, the listener is working (From what I see in the stdout/log4j logs)But when I try to hit the 'Content-Type: application/json' http://host:port/...

Data Engineering

4469 Views
4 replies
1 kudos

08-30-2022 10:24:49 AM

View Replies

Latest Reply

INJUSTIC
New Contributor II

11-20-2024 7:07:25 AM

1 kudos

Have you found the solution? Thanks

1 kudos

11-20-2024 7:07:25 AM

3 More Replies

by swetha • New Contributor III

08-30-2022 4:42:29 AM

4500 Views
3 replies
1 kudos

I am unable to attach a streaming listener to a spark streaming job. Error: no streaming listener attached to the spark application is the error we are observing post accessing streaming statistics API. Please help us with this issue ASAP. Thanks.

Issue:After adding the listener jar file in the cluster init script, the listener is working (From what I see in the stdout/log4j logs)But when I try to hit the 'Content-Type: application/json' http://host:port/api/v1/applications/app-id/streaming/st...

Data Engineering

4500 Views
3 replies
1 kudos

08-30-2022 4:42:29 AM

View Replies

Latest Reply

INJUSTIC
New Contributor II

11-20-2024 7:05:18 AM

1 kudos

Have you found the solution? Thanks

1 kudos

11-20-2024 7:05:18 AM

2 More Replies

by Data_Engineer3 • Contributor III

04-02-2023 9:20:18 AM

4993 Views
5 replies
0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL] #[Spark streaming] #[Spark structured streaming] #Spark

Data Engineering

4993 Views
5 replies
0 kudos

04-02-2023 9:20:18 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

10-31-2024 3:02:59 AM

0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html Also, what is the challenge while using foreachbatch?

0 kudos

10-31-2024 3:02:59 AM

4 More Replies

by MarsSu • New Contributor II

06-22-2023 6:46:06 PM

10658 Views
3 replies
0 kudos

How to implement merge multiple rows in single row with array and do not result in OOM?

Hi, Everyone.Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:* Befor...

Data Engineering

10658 Views
3 replies
0 kudos

06-22-2023 6:46:06 PM

View Replies

Latest Reply

917074
New Contributor II

01-19-2024 12:05:15 PM

0 kudos

Is there any solution to this, @MarsSu were you able to solve this, kindly shed some light on this if you resolve this.

0 kudos

01-19-2024 12:05:15 PM

2 More Replies

by lnights • New Contributor II

02-08-2023 2:12:28 PM

7239 Views
5 replies
2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

Data Engineering

7239 Views
5 replies
2 kudos

02-08-2023 2:12:28 PM

View Replies

Latest Reply

PetePP
New Contributor II

08-31-2023 7:02:55 AM

2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

2 kudos

08-31-2023 7:02:55 AM

4 More Replies

by MarsSu • New Contributor II

04-20-2023 7:36:38 PM

10878 Views
5 replies
1 kudos

Resolved! Databricks job about spark structured streaming zero downtime deployment in terraform.

I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform. Because we will upgrade spark application code version. But currently we found every deployment will cancel original...

Data Engineering

10878 Views
5 replies
1 kudos

04-20-2023 7:36:38 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-25-2023 10:22:00 PM

1 kudos

@Mars Su :Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifica...

1 kudos

04-25-2023 10:22:00 PM

4 More Replies

by pranathisg97 • New Contributor III

02-24-2023 2:38:14 AM

4569 Views
2 replies
1 kudos

readStream query throws exception if there's no data in delta location.

Hi,I have a scenario where writeStream query writes the stream data to bronze location and I have to read from bronze, do some processing and finally write it to silver. I use S3 location for delta tablesBut for the very first execution , readStream ...

Data Engineering

4569 Views
2 replies
1 kudos

02-24-2023 2:38:14 AM

View Replies

Latest Reply

Vartika
Databricks Employee

04-25-2023 4:37:05 AM

1 kudos

Hi @Pranathi Girish,Hope all is well!Checking in. If @Suteja Kanuri's answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?We'd love to hear from you.Thanks!

1 kudos

04-25-2023 4:37:05 AM

1 More Replies

by adrianlwn • New Contributor III

10-06-2022 10:35:20 AM

19315 Views
14 replies
16 kudos

How to activate ignoreChanges in Delta Live Table read_stream ?

Hello everyone, I'm using DLT (Delta Live Tables) and I've implemented some Change Data Capture for deduplication purposes. Now I am creating a downstream table that will read the DLT as a stream (dlt.read_stream("<tablename>")). I keep receiving thi...

Data Engineering

19315 Views
14 replies
16 kudos

10-06-2022 10:35:20 AM

View Replies

Latest Reply

gopínath
Databricks Employee

02-27-2023 7:03:18 PM

16 kudos

In DLT read_stream, we can't use ignoreChanges / ignoreDeletes. These are the configs helps to avoid the failures but it is actually ignoring the operations done on the upstream. So you need to manually perform the deletes or updates in the downstrea...

16 kudos

02-27-2023 7:03:18 PM

13 More Replies

by pranathisg97 • New Contributor III

03-02-2023 10:24:23 PM

2134 Views
2 replies
0 kudos

KinesisSource generates empty microbatches when there is no new data.

Is it normal for KinesisSource to generate empty microbatches when there is no new data in Kinesis? Batch 1 finished as there were records in kinesis and BatchId 2 started. BatchId 2 was running but then BatchId 3 started . Even though there was no m...

Data Engineering

2134 Views
2 replies
0 kudos

03-02-2023 10:24:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-30-2023 1:59:24 AM

0 kudos

Hi @Pranathi Girish Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

0 kudos

03-30-2023 1:59:24 AM

1 More Replies

by pranathisg97 • New Contributor III

02-15-2023 4:59:57 AM

4885 Views
7 replies
0 kudos

Resolved! Fetch new data from kinesis for every minute.

I want to fetch new data from kinesis source for every minute. I'm using "minFetchPeriod" option and specified 60s. But this doesn't seem to be working.Streaming query: spark \ .readStream \ .format("kinesis") \ .option("streamName", kinesis_stream_...

Data Engineering

4885 Views
7 replies
0 kudos

02-15-2023 4:59:57 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-10-2023 6:04:19 PM

0 kudos

Hi @Pranathi Girish Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedb...

0 kudos

03-10-2023 6:04:19 PM

6 More Replies

by Lulka • New Contributor II

02-20-2023 11:55:17 PM

5847 Views
2 replies
2 kudos

Resolved! How limit input rate reading delta table as stream?

Hello to everyone!I am trying to read delta table as a streaming source using spark. But my microbatches are disbalanced - one very small and the other are very huge. How I can limit this? I used different configurations with maxBytesPerTrigger and m...

Data Engineering

5847 Views
2 replies
2 kudos

02-20-2023 11:55:17 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-21-2023 4:12:45 AM

2 kudos

besides the parameters you mention, I don't know of any other which controls the batch size.did you check if the delta table is not horribly skewed?

2 kudos

02-21-2023 4:12:45 AM

1 More Replies

by chanansh • Contributor

01-30-2023 1:01:57 AM

2014 Views
1 replies
0 kudos

QueryExecutionListener cannot be found in pyspark

According to the documentation you can monitor a spark structure stream job using QueryExecutionListener. However I cannot find it. https://docs.databricks.com/structured-streaming/stream-monitoring.html#language-python

Data Engineering

2014 Views
1 replies
0 kudos

01-30-2023 1:01:57 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

02-23-2023 9:58:54 AM

0 kudos

Which DBR version are you using? also, can you share some code snippet on how you are using the QueryExecutionListener?

0 kudos

02-23-2023 9:58:54 AM

by chanansh • Contributor

02-08-2023 5:32:30 AM

1950 Views
1 replies
0 kudos

Running stateful spark streaming example fails https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

ERROR:py4j.clientserver:There was an exception while executing the Python Proxy on the Python Side. Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 617, in _call_proxy retu...

Data Engineering

1950 Views
1 replies
0 kudos

02-08-2023 5:32:30 AM

View Replies

Latest Reply

Debayan
Databricks Employee

02-09-2023 9:36:49 PM

0 kudos

Hi, The error looks like the failure was fetched from the PY configuration? Could you please provide the whole snippet of the error?

0 kudos

02-09-2023 9:36:49 PM

by Leszek • Contributor

09-09-2022 6:47:13 AM

1640 Views
1 replies
3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

Data Engineering

1640 Views
1 replies
3 kudos

09-09-2022 6:47:13 AM

View Replies

Latest Reply

Murthy1
Contributor II

02-08-2023 1:02:53 PM

3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

3 kudos

02-08-2023 1:02:53 PM