cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RateVan
by New Contributor II
  • 2754 Views
  • 4 replies
  • 0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

3P1l3
  • 2754 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dtank
New Contributor II
  • 0 kudos

Do you have any solution for this ?

  • 0 kudos
3 More Replies
avnish26
by New Contributor III
  • 11563 Views
  • 5 replies
  • 8 kudos

Spark 3.3.0 connect kafka problem

I am trying to connect to my Kafka from spark but getting an error:Kafka Version: 2.4.1Spark Version: 3.3.0I am using jupyter notebook to execute the pyspark code below:```from pyspark.sql.functions import *from pyspark.sql.types import *#import libr...

  • 11563 Views
  • 5 replies
  • 8 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 8 kudos

Hi @avnish26, did you added the Jar files to the cluster? do you still have issues? please let us know

  • 8 kudos
4 More Replies
swetha
by New Contributor III
  • 3184 Views
  • 4 replies
  • 1 kudos

Error: no streaming listener attached to the spark app is the error we are observing post accessing streaming statistics API. Please help us with this issue ASAP. Thanks.

Issue: Spark structured streaming applicationAfter adding the listener jar file in the cluster init script, the listener is working (From what I see in the stdout/log4j logs)But when I try to hit the 'Content-Type: application/json' http://host:port/...

  • 3184 Views
  • 4 replies
  • 1 kudos
Latest Reply
INJUSTIC
New Contributor II
  • 1 kudos

Have you found the solution? Thanks

  • 1 kudos
3 More Replies
swetha
by New Contributor III
  • 2777 Views
  • 3 replies
  • 1 kudos

I am unable to attach a streaming listener to a spark streaming job. Error: no streaming listener attached to the spark application is the error we are observing post accessing streaming statistics API. Please help us with this issue ASAP. Thanks.

Issue:After adding the listener jar file in the cluster init script, the listener is working (From what I see in the stdout/log4j logs)But when I try to hit the 'Content-Type: application/json' http://host:port/api/v1/applications/app-id/streaming/st...

  • 2777 Views
  • 3 replies
  • 1 kudos
Latest Reply
INJUSTIC
New Contributor II
  • 1 kudos

Have you found the solution? Thanks

  • 1 kudos
2 More Replies
Data_Engineer3
by Contributor III
  • 3122 Views
  • 5 replies
  • 0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL]​ #[Spark streaming]​ #[Spark structured streaming]​ #Spark​ 

  • 3122 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

doc - https://docs.databricks.com/en/structured-streaming/delta-lake.html  Also, what is the challenge while using foreachbatch?

  • 0 kudos
4 More Replies
CarterM
by New Contributor III
  • 6449 Views
  • 4 replies
  • 2 kudos

Resolved! Why Spark Streaming from S3 is returning thousands of files when there are only 9?

I am attempting to stream JSON endpoint responses from an s3 bucket into a spark DLT. I have been very successful in this practice previously, but the difference this time is that I am storing the responses from multiple endpoints in the same s3 buck...

8_9 endpoint response structure Soccer  endpoint  9 9 endpoint responses in same s3 bucket
  • 6449 Views
  • 4 replies
  • 2 kudos
Latest Reply
williamyoung
New Contributor II
  • 2 kudos

Hello Everyone,It seems like the issue you're encountering could be related to how Spark Streaming interprets the S3 file structure, especially when dealing with multiple sources. When files from multiple endpoints are stored in the same bucket, Spar...

  • 2 kudos
3 More Replies
YFL
by New Contributor III
  • 6892 Views
  • 11 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 6892 Views
  • 11 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
10 More Replies
SamarthJain
by New Contributor II
  • 6348 Views
  • 4 replies
  • 2 kudos

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your help here to understand what happens internally at the "Stream Initializing" phase of the Spark Streaming job tha...

  • 6348 Views
  • 4 replies
  • 2 kudos
Latest Reply
MohsenJ
Contributor
  • 2 kudos

I'm facing the same issue when I try to run this example Create a monitor using the API | Databricks on AWS (Inference Lakehouse Monitor regression example notebook). any idea? 

  • 2 kudos
3 More Replies
lnights
by New Contributor II
  • 4989 Views
  • 5 replies
  • 2 kudos

High cost of storage when using structured streaming

Hi there, I read data from Azure Event Hub and after manipulating with data I write the dataframe back to Event Hub (I use this connector for that): #read data df = (spark.readStream .format("eventhubs") .options(**ehConf) ...

transactions in azure storage
  • 4989 Views
  • 5 replies
  • 2 kudos
Latest Reply
PetePP
New Contributor II
  • 2 kudos

I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...

  • 2 kudos
4 More Replies
sanjay
by Valued Contributor II
  • 9527 Views
  • 8 replies
  • 3 kudos

How to stop continuous running streaming job over weekend

I have an continuous running streaming Job, I would like to stop this over weekend and start again on Monday. Here is my streaming job code.(spark.readStream.format("delta").load(input_path).writeStream.option("checkpointLocation", input_checkpoint_p...

  • 9527 Views
  • 8 replies
  • 3 kudos
Latest Reply
NDK
New Contributor II
  • 3 kudos

@sanjay Any luck on that, I am also looking for the solution for the same issue 

  • 3 kudos
7 More Replies
sparkstreaming
by New Contributor III
  • 6035 Views
  • 5 replies
  • 4 kudos

Resolved! Missing rows while processing records using foreachbatch in spark structured streaming from Azure Event Hub

I am new to real time scenarios and I need to create a spark structured streaming jobs in databricks. I am trying to apply some rule based validations from backend configurations on each incoming JSON message. I need to do the following actions on th...

  • 6035 Views
  • 5 replies
  • 4 kudos
Latest Reply
Rishi045
New Contributor III
  • 4 kudos

Were you able to achieve any solutions if yes please can you help with it.

  • 4 kudos
4 More Replies
Ryan_Chynoweth
by Esteemed Contributor
  • 2548 Views
  • 2 replies
  • 2 kudos

medium.com

Hi All, I recently published a streaming data comparison between Snowflake and Databricks. Hope you enjoy! Please let me know what you think! https://medium.com/@24chynoweth/data-streaming-at-scale-databricks-and-snowflake-ca65a2401649

  • 2548 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Nicely done. 

  • 2 kudos
1 More Replies
gg_047320_gg_94
by New Contributor II
  • 8180 Views
  • 1 replies
  • 1 kudos

DLT Spark readstream fails on the source table which is overwritten

I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons. df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('starting...

  • 8180 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, Could you please confirm DLT and DBR versions? Also please tag @Debayan​ with your next response which will notify me, Thank you!

  • 1 kudos
tech2cloud
by New Contributor II
  • 2960 Views
  • 3 replies
  • 2 kudos
  • 2960 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 2 kudos
2 More Replies
chanansh
by Contributor
  • 1255 Views
  • 1 replies
  • 0 kudos

stream from azure credentials

I am trying to read stream from azure:(spark.readStream .format("cloudFiles") .option('cloudFiles.clientId', CLIENT_ID) .option('cloudFiles.clientSecret', CLIENT_SECRET) .option('cloudFiles.tenantId', TENTANT_ID) .option("header", "true") .opti...

  • 1255 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hanan Shteingart​ :It looks like you're using the Azure Blob Storage connector for Spark to read data from Azure. The error message suggests that the credentials you provided are not being used by the connector.To specify the credentials, you can se...

  • 0 kudos
Labels