cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

fhmessas
by New Contributor II
  • 915 Views
  • 2 replies
  • 2 kudos

Trigger.AvailableNow getting stuck when there is no event

Hi, I have several streaming jobs, however one of them uses the Trigger.AvailableNow. The issue is that it gets stuck when there is no events or finishes ingesting all events. The expected behavior would be the job being shutdown.I've already checked...

Stuck streaming
  • 915 Views
  • 2 replies
  • 2 kudos
Latest Reply
fhmessas
New Contributor II
  • 2 kudos

Hi, the source is an S3 bucket using file notification with SQS.No errors or warns in the logs, the AvailableNow trigger just gets stuck.

  • 2 kudos
1 More Replies
Sas
by New Contributor II
  • 854 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 854 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
New Contributor III
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
Kearon
by New Contributor III
  • 2968 Views
  • 11 replies
  • 0 kudos

Process batches in a streaming pipeline - identifying deletes

OK. So I think I'm probably missing the obvious and tying myself in knots here.Here is the scenario:batch datasets arrive in json format in an Azure data lakeeach batch is a complete set of "current" records (the complete table)these are processed us...

  • 2968 Views
  • 11 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kearon McNicol​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 0 kudos
10 More Replies
Ria
by New Contributor
  • 672 Views
  • 1 replies
  • 1 kudos

py4j.security.Py4JSecurityException

Getting this error while loading data with autoloader. Although table access control is already disabled still getting this error."py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql...

image
  • 672 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi,Are you using a High concurrency cluster? which DBR version are you running?

  • 1 kudos
Leszek
by Contributor
  • 578 Views
  • 1 replies
  • 3 kudos

How to handle schema changes in streaming Delta Tables?

I'm using Structure Streaming when moving data from one Delta Table to another.How to handle schema changes in those tables (e.g. adding new column)?

  • 578 Views
  • 1 replies
  • 3 kudos
Latest Reply
Murthy1
Contributor II
  • 3 kudos

Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.

  • 3 kudos
lawrence009
by Contributor
  • 652 Views
  • 2 replies
  • 3 kudos

Advice on efficiently cleansing and transforming delta table

I have a delta table that is being updated nightly using Auto Loader. After the merge, the job kicks off a second notebook to clean and rewrite certain value using a series of UPDATE statements, e.g.,UPDATE TABLE foo SET field1 = some_value WHER...

  • 652 Views
  • 2 replies
  • 3 kudos
Latest Reply
Jfoxyyc
Valued Contributor
  • 3 kudos

I would partition the table by some sort of date that autoloader can use. You could then filter your update further and it'll automatically use partition pruning and only scan related files.

  • 3 kudos
1 More Replies
Aj2
by New Contributor III
  • 8948 Views
  • 1 replies
  • 4 kudos
  • 8948 Views
  • 1 replies
  • 4 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 4 kudos

A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...

  • 4 kudos
mattjones
by New Contributor II
  • 266 Views
  • 0 replies
  • 0 kudos

www.meetup.com

DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...

  • 266 Views
  • 0 replies
  • 0 kudos
clant
by New Contributor II
  • 768 Views
  • 1 replies
  • 4 kudos

Structured Streaming from SFTP

Hello,Is it possible to use a SFTP location to load from for structured streaming.At the moment we are going from SFTP->S3->databricks via structured streaming. I would like to cut out the S3 part.CheersChris

  • 768 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Chris Lant​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 4 kudos
Mado
by Valued Contributor II
  • 648 Views
  • 2 replies
  • 3 kudos

When should I use ".start()" with writeStream?

Hi,I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:Without .start() spark.readStream   .format("cloudFiles")   .option("cloudFiles.f...

  • 648 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
Mado
by Valued Contributor II
  • 1488 Views
  • 2 replies
  • 3 kudos

Question about "foreachBatch" to remove duplicate records when streaming data

Hi,I am practicing with Databricks sample notebook published here:https://github.com/databricks-academy/advanced-data-engineering-with-databricksIn one of the notebooks (ADE 3.1 - Streaming Deduplication) (URL), there is a sample code to remove dupli...

  • 1488 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 3 kudos
1 More Replies
patojo94
by New Contributor II
  • 1798 Views
  • 5 replies
  • 4 kudos

Resolved! pyspark streaming failed for now reason

Hi everyone, I have a pyspark streaming reading from an aws kinesis that suddenly failed for no reason (I mean, we did not make any changes in the last time).It is giving the following error: ERROR MicroBatchExecution: Query kinesis_events_prod_bronz...

  • 1798 Views
  • 5 replies
  • 4 kudos
Latest Reply
jcasanella
New Contributor III
  • 4 kudos

@patricio tojo​ I've the same problem, however in my case is after migrating into unity catalog. Need to investigate a little more but adding this to my spark job, it works:spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)

  • 4 kudos
4 More Replies
Bency
by New Contributor III
  • 8078 Views
  • 1 replies
  • 2 kudos

Queries with streaming sources must be executed with writeStream.start();

When I try to perform some transformations on a streaming data , I get Queries with streaming sources must be executed with writeStream.start(); error My aim is to do a lookup for every column in each rows in the streaming data . steaming_table=spark...

  • 8078 Views
  • 1 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 2 kudos

Hi @Bency Mathew​ You can use forEachBatch to perform the custom logic on each microbatch. Please refer to below document:https://docs.databricks.com/structured-streaming/foreach.html#perform-streaming-writes-to-arbitrary-data-sinks-with-structured-s...

  • 2 kudos
Soma
by Valued Contributor
  • 1280 Views
  • 5 replies
  • 0 kudos

Streaming Queries Failing Frequently in DBR 10.4 LTS for the Last Week

DBR 10.4 LTS is failing frequently due to GC overhead once in half an hour.Can anyone from Databricks Team let me know if we have some existing tickets or bugs.Note : We used the same configuration and same DBR for almost last 3 months.When checking ...

  • 1280 Views
  • 5 replies
  • 0 kudos
Latest Reply
Soma
Valued Contributor
  • 0 kudos

hi @Vidula Khanna​ have raised a support ticket to ADB from client side. We can close this however it seems like DBR Version 11.2 and above has some fixes for the RocksDB memory leak based on communication with Databricks developer team

  • 0 kudos
4 More Replies
jwilliam
by Contributor
  • 1404 Views
  • 4 replies
  • 4 kudos

Resolved! What is the maximum of concurrent streaming jobs for a cluster?

What is the maximum of concurrent streaming jobs for a cluster? How can I have the right amount of concurrent streaming jobs for different cluster configuration?Should I use multiple cluster for different jobs or combine it into a big cluster to hand...

  • 1404 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @John William​ , We haven't heard from you on the last response from @Prabakar​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.Al...

  • 4 kudos
3 More Replies
Labels