Topics with Label: Stream Processing

Forum Posts

Sorted by:

by Anonymous • Not applicable

02-10-2022 4:10:04 AM

7995 Views
9 replies
2 kudos

Resolved! Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

Data Engineering

7995 Views
9 replies
2 kudos

02-10-2022 4:10:04 AM

View Replies

Latest Reply

Briggsrr
New Contributor II

04-07-2025 11:36:40 PM

2 kudos

Experiencing workspace launch failures with custom AWS configuration is frustrating. The "MALFORMED_REQUEST" error and failed network validation checks suggest a VPC configuration issue. It feels like playing Infinite Craft, endlessly combining eleme...

2 kudos

04-07-2025 11:36:40 PM

8 More Replies

by Maksym • New Contributor III

01-19-2022 1:36:14 AM

12390 Views
5 replies
7 kudos

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark .readStream .format("cloudFiles") .option('cloudFil...

Data Engineering

12390 Views
5 replies
7 kudos

01-19-2022 1:36:14 AM

View Replies

Latest Reply

lassebe
New Contributor II

08-31-2023 1:22:00 AM

7 kudos

I had the same issue: files would randomly not be loaded.Setting `.option("cloudFiles.useIncrementalListing", False)` Seemed to do the trick!

7 kudos

08-31-2023 1:22:00 AM

4 More Replies

by herry • New Contributor III

12-06-2021 3:12:04 AM

5426 Views
4 replies
4 kudos

Resolved! Get the list of loaded files from Autoloader

Hello,We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?I can easily do this in AWS Glue j...

Data Engineering

5426 Views
4 replies
4 kudos

12-06-2021 3:12:04 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-09-2021 8:12:42 AM

4 kudos

@Herry Ramli - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?Thanks!

4 kudos

12-09-2021 8:12:42 AM

3 More Replies

by Bhawna_bedi • New Contributor II

03-13-2022 11:36:10 AM

7577 Views
5 replies
5 kudos

Resolved! I want to run a streaming job from morning 6a.m to evening 5p.m how can I schedule this window in databricks. Or how can u stop my stream at 5pm?

Data Engineering

7577 Views
5 replies
5 kudos

03-13-2022 11:36:10 AM

View Replies

Latest Reply

merca
Valued Contributor II

05-23-2023 2:45:51 AM

5 kudos

If you are streaming to delta, not much, the micro batch will fail and in next time the stream will pick up from last successful write (due to ACID). I don't know about other formats, what happens if the stream is aborted in mid micro batch.

5 kudos

05-23-2023 2:45:51 AM

4 More Replies

by hari • Contributor

06-01-2022 3:48:45 AM

6537 Views
3 replies
3 kudos

Multiple streaming sources to the same delta table

Is it possible to have two streaming sources doing Merge into the same delta table with each source setting a different set of fields?We are trying to create a single table which will be used by the service layer for queries. The table can be populat...

Data Engineering

6537 Views
3 replies
3 kudos

06-01-2022 3:48:45 AM

View Replies

Latest Reply

hari
Contributor

06-02-2022 3:57:10 AM

3 kudos

Hi @Zachary Higgins Thanks for the replyCurrently, we are also using Trigger.once so that we can handle the merge stream dependencies properly. But was wondering whether we can scale our pipeline to be streaming by changing the Trigger duration in t...

3 kudos

06-02-2022 3:57:10 AM

2 More Replies

by Himanshi • New Contributor III

07-21-2022 4:49:49 AM

2550 Views
1 replies
6 kudos

How to exclude the existing files when we need to move the streaming job from one databricks workspace to another databricks workspace that may not be compatible with the existing checkpoint state to resume the stream processing?

We do not want to process all the old files, we only wanted to process latest files. whenever we use the new checkpoint path in another databricks workspace, streaming job is processing all the old files as well. Without autoloader feature, is there ...

Data Engineering

2550 Views
1 replies
6 kudos

07-21-2022 4:49:49 AM

View Replies

Latest Reply

Shalabh007
Honored Contributor

11-29-2022 10:30:10 PM

6 kudos

@Himanshi Patle in spark streaming there is one option maxFileAge using which you can control which files to process based on their timestamp.

6 kudos

11-29-2022 10:30:10 PM

by User16783853906 • Databricks Employee

06-23-2021 2:52:55 PM

4857 Views
5 replies
5 kudos

Resolved! Update code for a streaming job in Production

How to update a streaming job in production with minimal/no downtime when there are significant code changes that may not be compatible with the existing checkpoint state to resume the stream processing?

Data Engineering

4857 Views
5 replies
5 kudos

06-23-2021 2:52:55 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-25-2022 1:51:13 AM

5 kudos

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.MA Health Connector

5 kudos

07-25-2022 1:51:13 AM

4 More Replies

by Confused • New Contributor III

12-03-2021 3:18:17 AM

11639 Views
7 replies
2 kudos

Schema evolution issue

Hi AllI am loading some data using auto loader but am having trouble with Schema evolution.A new column has been added to the data I am loading and I am getting the following error:StreamingQueryException: Encountered unknown field(s) during parsing:...

Data Engineering

11639 Views
7 replies
2 kudos

12-03-2021 3:18:17 AM

View Replies

Latest Reply

rgrosskopf
New Contributor II

07-15-2022 7:16:06 AM

2 kudos

I agree that hints are the way to go if you have the schema available but the whole point of schema evolution is that you might not always know the schema in advance.I received a similar error with a similar streaming query configuration. The issue w...

2 kudos

07-15-2022 7:16:06 AM

6 More Replies

by Kapur • New Contributor II

06-10-2022 11:09:01 AM

996 Views
0 replies
2 kudos

Is it Delta lake frae work merge operations require schema for spark structural stream processsing ?

Data Engineering

996 Views
0 replies
2 kudos

06-10-2022 11:09:01 AM

by Confused • New Contributor III

01-11-2022 1:31:42 AM

7165 Views
3 replies
3 kudos

Resolved! Dealing with updates to a delta table being used as a streaming source

Hi AllI have a requirement to perform updates on a delta table that is the source for a streaming query.I would like to be able to update the table and have the stream continue to work while also not ending up with duplicates.From my research it se...

Data Engineering

7165 Views
3 replies
3 kudos

01-11-2022 1:31:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-19-2022 8:36:37 AM

3 kudos

Hey @Mathew Walters Hope you are doing great.Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution? Else please let us know if you need more help. We'd love to hear from you.Thanks!

3 kudos

05-19-2022 8:36:37 AM

2 More Replies

by dataslicer • Contributor

04-14-2022 3:29:53 PM

7845 Views
6 replies
2 kudos

Resolved! Exploring additional cost saving options for structured streaming 24x7x365 uptime workloads

I currently have multiple jobs (each running its own job cluster) for my spark structured streaming pipelines that are long running 24x7x365 on DBR 9.x/10.x LTS. My SLAs are 24x7x365 with 1 minute latency. I have already accomplished the following co...

Data Engineering

7845 Views
6 replies
2 kudos

04-14-2022 3:29:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2022 4:11:05 PM

2 kudos

Autoscaling doesn't work with structured streaming, so that's not really an option. Autoscaling is based on jobs sitting in the jobs queue for a long time, but that's not the case with streaming. Streaming is more many frequent small jobs. Spot in...

2 kudos

04-14-2022 4:11:05 PM

5 More Replies

by GMO • New Contributor III

11-19-2021 7:16:06 AM

3567 Views
4 replies
1 kudos

Resolved! Trigger.AvailableOnce in Pyspark?

There’s a new Trigger.AvailableOnce option in runtime 10.1 that we need to process a large folder bit by bit using Autoloader. But I don’t see how to engage this from pyspark. Is this accessible from scala only or is it available in pyspark? Thanks...

Data Engineering

3567 Views
4 replies
1 kudos

11-19-2021 7:16:06 AM

View Replies

Latest Reply

pottsork
New Contributor II

02-16-2022 7:51:08 AM

1 kudos

Any update on this issue? I can see that one can use .trigger(availableNow=True) i DBR 10.3 (On Azure Databricks).... Unfortunately I can't get it to work with Autoloader. Is this supported? Additionally, can't find any answers when skimming through ...

1 kudos

02-16-2022 7:51:08 AM

3 More Replies

by BorislavBlagoev • Valued Contributor III

09-24-2021 9:24:10 AM

7270 Views
4 replies
4 kudos

Resolved! Databricks writeStream checkpoint

I'm trying to execute this writeStream data_frame.writeStream.format("delta") \ .option("checkpointLocation", checkpoint_path) \ .trigger(processingTime="1 second") \ .option("mergeSchema", "true") \ .o...

Data Engineering

7270 Views
4 replies
4 kudos

09-24-2021 9:24:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-12-2021 6:46:15 AM

4 kudos

You can remove that folder so it will be recreated automatically.Additionally every new job run should have new (or just empty) checkpoint location.You can add in your code before running streaming:dbutils.fs.rm(checkpoint_path, True)Additionally you...

4 kudos

10-12-2021 6:46:15 AM

3 More Replies

Databricks Community

Resolved! Issue in creating workspace - Custom AWS Configuration

Resolved! Databricks Autoloader is getting stuck and does not pass to the next batch

Resolved! Get the list of loaded files from Autoloader

Resolved! I want to run a streaming job from morning 6a.m to evening 5p.m how can I schedule this window in databricks. Or how can u stop my stream at 5pm?

Multiple streaming sources to the same delta table

How to exclude the existing files when we need to move the streaming job from one databricks workspace to another databricks workspace that may not be compatible with the existing checkpoint state to resume the stream processing?

Resolved! Update code for a streaming job in Production

Schema evolution issue

Is it Delta lake frae work merge operations require schema for spark structural stream processsing ?

Resolved! Dealing with updates to a delta table being used as a streaming source

Resolved! Exploring additional cost saving options for structured streaming 24x7x365 uptime workloads

Resolved! Trigger.AvailableOnce in Pyspark?

Resolved! Databricks writeStream checkpoint