Topics with Label: Structured streaming

Forum Posts

Sorted by:

by Sandesh87 • New Contributor III

05-27-2022 12:39:10 PM

5188 Views
3 replies
2 kudos

Task not serializable: java.io.NotSerializableException: org.apache.spark.sql.streaming.DataStreamWriter

I have a getS3Object function to get (json) objects located in aws s3 object client_connect extends Serializable { val s3_get_path = "/dbfs/mnt/s3response" def getS3Objects(s3ObjectName: String, s3Client: AmazonS3): String = { val...

Data Engineering

5188 Views
3 replies
2 kudos

05-27-2022 12:39:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-28-2022 10:13:48 AM

2 kudos

Hey there @Sandesh Puligundla Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

2 kudos

07-28-2022 10:13:48 AM

2 More Replies

by lnights • New Contributor II

02-08-2023 2:12:28 PM

5317 Views
5 replies
2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

Data Engineering

5317 Views
5 replies
2 kudos

02-08-2023 2:12:28 PM

View Replies

Latest Reply

PetePP
New Contributor II

08-31-2023 7:02:55 AM

2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

2 kudos

08-31-2023 7:02:55 AM

4 More Replies

by UmaMahesh1 • Honored Contributor III

11-29-2022 10:38:31 AM

7978 Views
7 replies
17 kudos

Spark Structured Streaming : Data write is too slow into adls.

I'm a bit new to spark structured streaming stuff so do ask all the relevant questions if I missed any.I have a notebook which consumes the events from a kafka topic and writes those records into adls. The topic is json serialized so I'm just writing...

Data Engineering

7978 Views
7 replies
17 kudos

11-29-2022 10:38:31 AM

View Replies

Latest Reply

Miletto
New Contributor II

08-28-2023 2:09:35 PM

17 kudos

17 kudos

08-28-2023 2:09:35 PM

6 More Replies

by sparkstreaming • New Contributor III

12-22-2021 6:53:34 AM

6406 Views
5 replies
4 kudos

Resolved! Missing rows while processing records using foreachbatch in spark structured streaming from Azure Event Hub

I am new to real time scenarios and I need to create a spark structured streaming jobs in databricks. I am trying to apply some rule based validations from backend configurations on each incoming JSON message. I need to do the following actions on th...

Data Engineering

6406 Views
5 replies
4 kudos

12-22-2021 6:53:34 AM

View Replies

Latest Reply

Rishi045
New Contributor III

08-03-2023 3:33:51 AM

4 kudos

Were you able to achieve any solutions if yes please can you help with it.

4 kudos

08-03-2023 3:33:51 AM

4 More Replies

by tlecomte • New Contributor III

01-02-2023 1:05:56 AM

5625 Views
6 replies
3 kudos

Resolved! Enabling Adaptive Query Execution and Cost-Based Optimizer in Structured Streaming foreachBatch

Dear Databricks community,I am using Spark Structured Streaming to move data from silver to gold in an ETL fashion. The source stream is the change data feed of a Delta table in silver. The streaming dataframe is transformed and joined with a couple ...

Data Engineering

5625 Views
6 replies
3 kudos

01-02-2023 1:05:56 AM

View Replies

Latest Reply

Lingesh
Databricks Employee

03-30-2023 1:08:23 PM

3 kudos

It's not recommended to have AQE on a Streaming query for the same reason you shared in the description. It has been documented here

3 kudos

03-30-2023 1:08:23 PM

5 More Replies

by scalasparkdev • New Contributor

02-25-2023 9:31:46 AM

2788 Views
2 replies
0 kudos

Pyspark Structured Streaming Avro integration to Azure Schema Registry with Kafka/Eventhub in Databricks environment.

I am looking for a simple way to have a structured streaming pipeline that would automatically register a schema to Azure schema registry when converting a df col into avro and that would be able to deserialize an avro col based on schema registry ur...

Data Engineering

2788 Views
2 replies
0 kudos

02-25-2023 9:31:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2023 8:35:44 PM

0 kudos

Hi @Tomas Sedlon Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

0 kudos

04-27-2023 8:35:44 PM

1 More Replies

by pranathisg97 • New Contributor III

02-24-2023 2:38:14 AM

3623 Views
2 replies
1 kudos

readStream query throws exception if there's no data in delta location.

Hi,I have a scenario where writeStream query writes the stream data to bronze location and I have to read from bronze, do some processing and finally write it to silver. I use S3 location for delta tablesBut for the very first execution , readStream ...

Data Engineering

3623 Views
2 replies
1 kudos

02-24-2023 2:38:14 AM

View Replies

Latest Reply

Vartika
Databricks Employee

04-25-2023 4:37:05 AM

1 kudos

Hi @Pranathi Girish,Hope all is well!Checking in. If @Suteja Kanuri's answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?We'd love to hear from you.Thanks!

1 kudos

04-25-2023 4:37:05 AM

1 More Replies

by Starki • New Contributor III

03-29-2023 6:35:43 AM

1139 Views
1 replies
0 kudos

Maintaining Custom State in Structured Streaming

I am consuming an IoT stream with thousands of different signals using Structured Streaming. During processing of the stream, I need to know the previous timestamp and value for each signal in the micro batch. The signal stream is eventually written ...

Data Engineering

1139 Views
1 replies
0 kudos

03-29-2023 6:35:43 AM

View Replies

Latest Reply

Soma
Valued Contributor

04-08-2023 10:34:31 AM

0 kudos

@Suteja Kanuri Tried the above on streaming DFBut facing the below errorAttributeError: 'DataFrame' object has no attribute 'groupByKey'Can you please let me know DBR runtime

0 kudos

04-08-2023 10:34:31 AM

by Ossian • New Contributor

07-21-2021 12:08:18 AM

2106 Views
1 replies
0 kudos

Driver restarts and job dies after 10-20 hours (Structured Streaming)

I am running a java/jar Structured Streaming job on a single node cluster (Databricks runtime 8.3). The job contains a single query which reads records from multiple Azure Event Hubs using Spark Kafka functionality and outputs results to a mssql dat...

Data Engineering

2106 Views
1 replies
0 kudos

07-21-2021 12:08:18 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-10-2022 6:31:30 AM

0 kudos

its seems that when your nodes are increasing it is seeking for init script and it is failing so you can use reserve instances for this activity instead of spot instances it will increase your overall costor alternatively, you can use depended librar...

0 kudos

12-10-2022 6:31:30 AM

by Ashok1 • New Contributor II

06-13-2022 4:20:44 AM

1466 Views
2 replies
1 kudos

Can we use autoloader to stream files from delta tables(source)

Data Engineering

1466 Views
2 replies
1 kudos

06-13-2022 4:20:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

08-18-2022 8:31:41 AM

1 kudos

Hey there @Ashok ch Hope everything is going great.Does @Ivan Tang's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

1 kudos

08-18-2022 8:31:41 AM

1 More Replies

by Imran_Anwar • New Contributor II

06-29-2022 4:05:30 PM

913 Views
0 replies
1 kudos

Structured streaming vs Confluent Kstream

For Ultra low latency customer facing App, I am curious on cost efficiency between Structured streaming and Kstream; which work better in terms of cost ? Though still achieving the ultra low latency and quality outcome. Appreciate any thoughts from p...

Data Engineering

913 Views
0 replies
1 kudos

06-29-2022 4:05:30 PM

by drewster • New Contributor III

02-22-2022 6:39:08 PM

13742 Views
13 replies
13 kudos

Resolved! Spark streaming autoloader slow second batch - checkpoint issues?

I am running a massive history of about 250gb ~6mil phone call transcriptions (json read in as raw text) from a raw -> bronze pipeline in Azure Databricks using pyspark. The source is mounted storage and is continuously having files added and we do n...

Data Engineering

13742 Views
13 replies
13 kudos

02-22-2022 6:39:08 PM

View Replies

Latest Reply

Brooksjit
New Contributor III

06-25-2022 5:29:28 AM

13 kudos

Thank you for the explanation.

13 kudos

06-25-2022 5:29:28 AM

12 More Replies

by VivekBhalgat • New Contributor II

06-16-2022 12:11:36 PM

1436 Views
0 replies
2 kudos

Spark Structured Streaming: How to run N queries on each window

I have timeseries data in k Kafka topics. I would like to read this data into windows of length 10 minutes. For each window, I want to run N SQL queries and materialize result. The specific N queries to run depends on the kafka topic name. How should...

Data Engineering

1436 Views
0 replies
2 kudos

06-16-2022 12:11:36 PM

by Constantine • Contributor III

04-26-2022 8:57:30 AM

3199 Views
1 replies
1 kudos

Resolved! Can we reuse checkpoints in Spark Streaming?

I am reading data from a Kafka topic, say topic_a. I have an application, app_one which uses Spark Streaming to read data from topic_a. I have a checkpoint location, loc_a to store the checkpoint. Now, app_one has read data till offset 90.Can I creat...

Data Engineering

3199 Views
1 replies
1 kudos

04-26-2022 8:57:30 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-14-2022 5:21:50 PM

1 kudos

Hi @John Constantine,Is not recommended to share the checkpoint with your queries. Every streaming query should have their own checkpoint. If you can to start at the offset 90 in another query, then you can define it when starting your job. You can ...

1 kudos

06-14-2022 5:21:50 PM

by dataslicer • Contributor

04-14-2022 3:29:53 PM

5791 Views
7 replies
2 kudos

Resolved! Exploring additional cost saving options for structured streaming 24x7x365 uptime workloads

I currently have multiple jobs (each running its own job cluster) for my spark structured streaming pipelines that are long running 24x7x365 on DBR 9.x/10.x LTS. My SLAs are 24x7x365 with 1 minute latency. I have already accomplished the following co...

Data Engineering

5791 Views
7 replies
2 kudos

04-14-2022 3:29:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-18-2022 5:29:05 AM

2 kudos

http://doramasmp4.tv/

2 kudos

05-18-2022 5:29:05 AM

6 More Replies