Hi, I have several streaming jobs, however one of them uses the Trigger.AvailableNow. The issue is that it gets stuck when there is no events or finishes ingesting all events. The expected behavior would be the job being shutdown.I've already checked...
HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types i...
Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan
OK. So I think I'm probably missing the obvious and tying myself in knots here.Here is the scenario:batch datasets arrive in json format in an Azure data lakeeach batch is a complete set of "current" records (the complete table)these are processed us...
Hi @Kearon McNicol​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...
Getting this error while loading data with autoloader. Although table access control is already disabled still getting this error."py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql...
Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.
I have a delta table that is being updated nightly using Auto Loader. After the merge, the job kicks off a second notebook to clean and rewrite certain value using a series of UPDATE statements, e.g.,UPDATE TABLE foo
SET field1 = some_value
WHER...
I would partition the table by some sort of date that autoloader can use. You could then filter your update further and it'll automatically use partition pruning and only scan related files.
A live table or view always reflects the results of the query that defines it, including when the query defining the table or view is updated, or an input data source is updated. Like a traditional materialized view, a live table or view may be entir...
DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...
Hello,Is it possible to use a SFTP location to load from for structured streaming.At the moment we are going from SFTP->S3->databricks via structured streaming. I would like to cut out the S3 part.CheersChris
Hi @Chris Lant​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.
Hi,I am practicing with Databricks. In sample notebooks,I have seen different use of writeStream with or without ".start()" method. Samples are below:Without .start() spark.readStream
.format("cloudFiles")
.option("cloudFiles.f...
Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks
Hi,I am practicing with Databricks sample notebook published here:https://github.com/databricks-academy/advanced-data-engineering-with-databricksIn one of the notebooks (ADE 3.1 - Streaming Deduplication) (URL), there is a sample code to remove dupli...
Hi @Mohammad Saber​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks
Hi everyone, I have a pyspark streaming reading from an aws kinesis that suddenly failed for no reason (I mean, we did not make any changes in the last time).It is giving the following error: ERROR MicroBatchExecution: Query kinesis_events_prod_bronz...
@patricio tojo​ I've the same problem, however in my case is after migrating into unity catalog. Need to investigate a little more but adding this to my spark job, it works:spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)
When I try to perform some transformations on a streaming data , I get Queries with streaming sources must be executed with writeStream.start(); error My aim is to do a lookup for every column in each rows in the streaming data . steaming_table=spark...
Hi @Bency Mathew​ You can use forEachBatch to perform the custom logic on each microbatch. Please refer to below document:https://docs.databricks.com/structured-streaming/foreach.html#perform-streaming-writes-to-arbitrary-data-sinks-with-structured-s...
DBR 10.4 LTS is failing frequently due to GC overhead once in half an hour.Can anyone from Databricks Team let me know if we have some existing tickets or bugs.Note : We used the same configuration and same DBR for almost last 3 months.When checking ...
hi @Vidula Khanna​ have raised a support ticket to ADB from client side. We can close this however it seems like DBR Version 11.2 and above has some fixes for the RocksDB memory leak based on communication with Databricks developer team
What is the maximum of concurrent streaming jobs for a cluster? How can I have the right amount of concurrent streaming jobs for different cluster configuration?Should I use multiple cluster for different jobs or combine it into a big cluster to hand...
Hi @John William​ , We haven't heard from you on the last response from @Prabakar​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.Al...