cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AndriusVitkausk
by New Contributor III
  • 1444 Views
  • 1 replies
  • 1 kudos

Autoloader event vs directory ingestion

For a production work load containing around 15k gzip compressed json files per hour all in a YYYY/MM/DD/HH/id/timestamp.json.gz directoryWhat would be the better approach on ingesting this into a delta table in terms of not only the incremental load...

  • 1444 Views
  • 1 replies
  • 1 kudos
Latest Reply
AndriusVitkausk
New Contributor III
  • 1 kudos

@Kaniz Fatma​ So i've not found a fix for the small file problem using autoloader, seems to struggle really badly against large directories, had a cluster running for 8h stuck on "listing directory" part with no end, cluster seemed completely idle to...

  • 1 kudos
Jreco
by Contributor
  • 4559 Views
  • 6 replies
  • 4 kudos

Resolved! messages from event hub does not flow after a time

Hi Team,I'm trying to build a Real-time solution using Databricks and Event hubs.Something weird happens after a time that the process start.At the begining the messages flow through the process as expected with this rate: please, note that the last ...

image image image
  • 4559 Views
  • 6 replies
  • 4 kudos
Latest Reply
Jreco
Contributor
  • 4 kudos

Thanks for your answer @Hubert Dudek​ , Is already specifiedWhat do youn mean with this? This is the weird part of this, bucause the data is flowing good, but at any time is like the Job stop the reading or somethign like that and if I restart the ...

  • 4 kudos
5 More Replies
Jreco
by Contributor
  • 8924 Views
  • 13 replies
  • 3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

image image
  • 8924 Views
  • 13 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

hi @Jhonatan Reyes​ ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

  • 3 kudos
12 More Replies
Labels