cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lokeshr
by New Contributor
  • 1347 Views
  • 2 replies
  • 1 kudos

Clarity on usage STREAM while defining DLT tables

Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link https://databricks.com/discover/pages/getting-started-with-delta-live-tablesWhile I get most of what is described in page, I fin...

  • 1347 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Lokesh Raju​,Just a friendly follow-up. Did Tomasz's response help you to resolved your question? If it did, please mark it as best.

  • 1 kudos
1 More Replies
Antoine_De_A
by New Contributor III
  • 2969 Views
  • 1 replies
  • 3 kudos

Resolved! Streaming data to CosmosDB

Hello everyone,Here is the problem I am facing. I'm currently working on streaming data to DataBricks, my goal is to create a data stream on a first notebook, and then on a second notebook to read this data stream, add all the new rows to a dataFrame...

  • 2969 Views
  • 1 replies
  • 3 kudos
Latest Reply
Antoine_De_A
New Contributor III
  • 3 kudos

Problem solved!Instead of trying to do everything directly with the .writeStream options I used the .forEachBatch() function which allows me to call a function outside the .writeStream().In this function I get a dataFrame in parameter which is my str...

  • 3 kudos
tom_shaffner
by New Contributor III
  • 9943 Views
  • 1 replies
  • 2 kudos

"Detected a data update", what changed?

In streaming flows I periodically get a "Detected a data update" error. This error generally seem to indicate that something has changed in the source table schema, but it's not immediately apparent what. In one case yesterday I pulled the source tab...

  • 9943 Views
  • 1 replies
  • 2 kudos
Latest Reply
tom_shaffner
New Contributor III
  • 2 kudos

@Kaniz Fatma​ , Thanks, that helps. I was assuming this warning indicated a schema evolution, and based on what you say it likely wasn't and I just have to turn on IgnoreChanges any time I have a stream from a table that receives updates/upserts.To b...

  • 2 kudos
itay
by New Contributor II
  • 2027 Views
  • 2 replies
  • 1 kudos

Streaming with runOnce and groupBy window queries

I have a streaming job running a groupBy query with a Window of 3 days. The query is searching for different types of events.The stream is configured with runOnce and there is a job scheduled for every hour.Now, I'm not sure what data is processed ea...

  • 2027 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @itay k​ ,You will need to take a look at the Progress Reporter. This will show the Micro-batch JSON metrics. For example, the metric called "numInputRows" which will display the number of input rows that it processed for the micro-batch. You will...

  • 1 kudos
1 More Replies
Jreco
by Contributor
  • 13361 Views
  • 13 replies
  • 3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

image image
  • 13361 Views
  • 13 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

hi @Jhonatan Reyes​ ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

  • 3 kudos
12 More Replies
RajaLakshmanan
by New Contributor
  • 3512 Views
  • 2 replies
  • 1 kudos

Resolved! Spark StreamingQuery not processing all data from source directory

Hi,I have setup a streaming process that consumers files from HDFS staging directory and writes into target location. Input directory continuesouly gets files from another process.Lets say file producer produces 5 million records sends it to hdfs sta...

  • 3512 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16763506586
Contributor
  • 1 kudos

If it helps , you run try running the Left-Anti join on source and sink to identify missing records and see whether the record is in match with the schema provided or not

  • 1 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 1888 Views
  • 1 replies
  • 0 kudos

Streaming with Kafka with the same groupid

A kafka topic is having 300 partitions and I see two clusters are running and have the same group id, will the data be duplicate in my delta bonze layer

  • 1888 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

By default, each streaming query generates a unique group ID for reading data ( ensuring it's own  its own consumer group ) . In scenarios where you'd want to specify it (authz etc ) , it is not recommended to have two streaming applications specify ...

  • 0 kudos
User16826992666
by Valued Contributor
  • 5656 Views
  • 2 replies
  • 0 kudos
  • 5656 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

If the read stream definition has something similar to val df = spark .read .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribePattern", "topic.*") .option("startingOffsets", "earliest")resettin...

  • 0 kudos
1 More Replies
User16826992666
by Valued Contributor
  • 4922 Views
  • 2 replies
  • 0 kudos

Resolved! Can multiple streams write to a Delta table at the same time?

Wondering if there any dangers to doing this, and if it's a best practice. I'm concerned there could be conflicts but I'm not sure how Delta would handle it.

  • 4922 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

>Can multiple streams write to a Delta table at the same time?Yes delta uses optimistic concurrency control and configurable isolation levels>I'm concerned there could be conflicts but I'm not sure how Delta would handle it.Write operations can resul...

  • 0 kudos
1 More Replies
Labels