Topics with Label: Streaming

Forum Posts

Sorted by:

by fhmessas • New Contributor II

06-01-2023 9:44:48 AM

915 Views
2 replies
2 kudos

Trigger.AvailableNow getting stuck when there is no event

Hi, I have several streaming jobs, however one of them uses the Trigger.AvailableNow. The issue is that it gets stuck when there is no events or finishes ingesting all events. The expected behavior would be the job being shutdown.I've already checked...

Data Engineering

915 Views
2 replies
2 kudos

06-01-2023 9:44:48 AM

View Replies

Latest Reply

fhmessas
New Contributor II

06-14-2023 4:03:03 PM

2 kudos

Hi, the source is an S3 bucket using file notification with SQS.No errors or warns in the logs, the AvailableNow trigger just gets stuck.

2 kudos

06-14-2023 4:03:03 PM

1 More Replies

by Sas • New Contributor II

05-14-2023 11:12:32 PM

854 Views
1 replies
0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

Data Engineering

854 Views
1 replies
0 kudos

05-14-2023 11:12:32 PM

View Replies

Latest Reply

swethaNandan
New Contributor III

06-14-2023 10:35:00 AM

0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

0 kudos

06-14-2023 10:35:00 AM

by Kearon • New Contributor III

04-06-2023 7:43:51 AM

2968 Views
11 replies
0 kudos

Process batches in a streaming pipeline - identifying deletes

OK. So I think I'm probably missing the obvious and tying myself in knots here.Here is the scenario:batch datasets arrive in json format in an Azure data lakeeach batch is a complete set of "current" records (the complete table)these are processed us...

Data Engineering

2968 Views
11 replies
0 kudos

04-06-2023 7:43:51 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-12-2023 12:44:43 AM

0 kudos

Hi @Kearon McNicol Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

0 kudos

04-12-2023 12:44:43 AM

10 More Replies

by Ria • New Contributor

02-09-2023 10:19:06 PM

672 Views
1 replies
1 kudos

py4j.security.Py4JSecurityException

Getting this error while loading data with autoloader. Although table access control is already disabled still getting this error."py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql...

Data Engineering

672 Views
1 replies
1 kudos

02-09-2023 10:19:06 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-22-2023 2:17:47 PM

1 kudos

Hi,Are you using a High concurrency cluster? which DBR version are you running?

1 kudos

02-22-2023 2:17:47 PM

by Leszek • Contributor

09-09-2022 6:47:13 AM

578 Views
1 replies
3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

Data Engineering

578 Views
1 replies
3 kudos

09-09-2022 6:47:13 AM

View Replies

Latest Reply

Murthy1
Contributor II

02-08-2023 1:02:53 PM

3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

3 kudos

02-08-2023 1:02:53 PM

by lawrence009 • Contributor

12-20-2022 2:04:13 PM

652 Views
2 replies
3 kudos

Advice on efficiently cleansing and transforming delta table

I have a delta table that is being updated nightly using Auto Loader. After the merge, the job kicks off a second notebook to clean and rewrite certain value using a series of UPDATE statements, e.g.,UPDATE TABLE foo SET field1 = some_value WHER...

Data Engineering

652 Views
2 replies
3 kudos

12-20-2022 2:04:13 PM

View Replies

Latest Reply

Jfoxyyc
Valued Contributor

12-28-2022 11:42:05 PM

3 kudos

I would partition the table by some sort of date that autoloader can use. You could then filter your update further and it'll automatically use partition pruning and only scan related files.

3 kudos

12-28-2022 11:42:05 PM

1 More Replies

by Aj2 • New Contributor III

12-13-2022 12:03:53 AM

8948 Views
1 replies
4 kudos

Resolved! What is the difference between Streaming live table and live table?

Data Engineering

8948 Views
1 replies
4 kudos

12-13-2022 12:03:53 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-13-2022 3:52:38 AM

4 kudos

A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...

4 kudos

12-13-2022 3:52:38 AM

by mattjones • New Contributor II

12-05-2022 11:47:52 AM

266 Views
0 replies
0 kudos

www.meetup.com

DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...

Data Engineering

266 Views
0 replies
0 kudos

12-05-2022 11:47:52 AM

by clant • New Contributor II

10-27-2022 1:11:33 AM

768 Views
1 replies
4 kudos

Structured Streaming from SFTP

Hello,Is it possible to use a SFTP location to load from for structured streaming.At the moment we are going from SFTP->S3->databricks via structured streaming. I would like to cut out the S3 part.CheersChris

Data Engineering

768 Views
1 replies
4 kudos

10-27-2022 1:11:33 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 10:49:31 PM

4 kudos

Hi @Chris Lant Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

4 kudos

11-27-2022 10:49:31 PM

by Mado • Valued Contributor II

10-20-2022 12:44:25 AM

648 Views
2 replies
3 kudos

When should I use ".start()" with writeStream?

Hi,I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:Without .start() spark.readStream .format("cloudFiles") .option("cloudFiles.f...

Data Engineering

648 Views
2 replies
3 kudos

10-20-2022 12:44:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:46:57 AM

3 kudos

Hi @Mohammad Saber Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

3 kudos

11-27-2022 5:46:57 AM

1 More Replies

by Mado • Valued Contributor II

10-19-2022 10:23:57 PM

1488 Views
2 replies
3 kudos

Question about "foreachBatch" to remove duplicate records when streaming data

Hi,I am practicing with Databricks sample notebook published here:https://github.com/databricks-academy/advanced-data-engineering-with-databricksIn one of the notebooks (ADE 3.1 - Streaming Deduplication) (URL), there is a sample code to remove dupli...

Data Engineering

1488 Views
2 replies
3 kudos

10-19-2022 10:23:57 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:46:00 AM

3 kudos

3 kudos

11-27-2022 5:46:00 AM

1 More Replies

by patojo94 • New Contributor II

05-07-2022 1:05:23 PM

1798 Views
5 replies
4 kudos

Resolved! pyspark streaming failed for now reason

Hi everyone, I have a pyspark streaming reading from an aws kinesis that suddenly failed for no reason (I mean, we did not make any changes in the last time).It is giving the following error: ERROR MicroBatchExecution: Query kinesis_events_prod_bronz...

Data Engineering

1798 Views
5 replies
4 kudos

05-07-2022 1:05:23 PM

View Replies

Latest Reply

jcasanella
New Contributor III

11-07-2022 3:45:40 PM

4 kudos

@patricio tojo I've the same problem, however in my case is after migrating into unity catalog. Need to investigate a little more but adding this to my spark job, it works:spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)

4 kudos

11-07-2022 3:45:40 PM

4 More Replies

by Bency • New Contributor III

08-02-2022 7:19:38 AM

8078 Views
1 replies
2 kudos

Queries with streaming sources must be executed with writeStream.start();

When I try to perform some transformations on a streaming data , I get Queries with streaming sources must be executed with writeStream.start(); error My aim is to do a lookup for every column in each rows in the streaming data . steaming_table=spark...

Data Engineering

8078 Views
1 replies
2 kudos

08-02-2022 7:19:38 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

09-30-2022 5:44:41 AM

2 kudos

Hi @Bency Mathew You can use forEachBatch to perform the custom logic on each microbatch. Please refer to below document:https://docs.databricks.com/structured-streaming/foreach.html#perform-streaming-writes-to-arbitrary-data-sinks-with-structured-s...

2 kudos

09-30-2022 5:44:41 AM

by Soma • Valued Contributor

08-09-2022 7:47:25 AM

1280 Views
5 replies
0 kudos

Streaming Queries Failing Frequently in DBR 10.4 LTS for the Last Week

DBR 10.4 LTS is failing frequently due to GC overhead once in half an hour.Can anyone from Databricks Team let me know if we have some existing tickets or bugs.Note : We used the same configuration and same DBR for almost last 3 months.When checking ...

Data Engineering

1280 Views
5 replies
0 kudos

08-09-2022 7:47:25 AM

View Replies

Latest Reply

Soma
Valued Contributor

09-26-2022 8:29:26 AM

0 kudos

hi @Vidula Khanna have raised a support ticket to ADB from client side. We can close this however it seems like DBR Version 11.2 and above has some fixes for the RocksDB memory leak based on communication with Databricks developer team

0 kudos

09-26-2022 8:29:26 AM

4 More Replies

by jwilliam • Contributor

08-30-2022 9:25:48 PM

1404 Views
4 replies
4 kudos

Resolved! What is the maximum of concurrent streaming jobs for a cluster?

What is the maximum of concurrent streaming jobs for a cluster? How can I have the right amount of concurrent streaming jobs for different cluster configuration?Should I use multiple cluster for different jobs or combine it into a big cluster to hand...

Data Engineering

1404 Views
4 replies
4 kudos

08-30-2022 9:25:48 PM

View Replies

Latest Reply

Kaniz
Community Manager

09-03-2022 1:48:16 PM

4 kudos

Hi @John William , We haven't heard from you on the last response from @Prabakar , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.Al...

4 kudos

09-03-2022 1:48:16 PM

3 More Replies