โ03-14-2023 06:39 AM
Databricks now supports event-driven workloads, especially for loading cloud files from external locations. This means you can save costs and resources by triggering your Databricks jobs only when new files arrive in your cloud storage instead of mounting it as DBFS and polling it periodically. To use this feature, you need to follow these steps:
With these steps, you can easily create and run event-driven workloads on Databricks.
โ03-19-2024 08:04 PM
Hey,
We have a use case where we have Salesforce generating Change Data Capture (CDC) platform events. With this new event driven workload, can Databricks directly consume these CDC events from Salesforce?
We are currently also evaluating a middleware like Mulesoft are directed in this reference article: Subscribe to Change Data Capture Events with the Salesforce Connector. However we are concerned about the pricing of Mulesoft.
โ03-20-2024 01:03 AM
I think we are talking about file events here.
What you are talking about is in fact streaming ingest from a CDC system. That can be done but not by directly connecting to the CDC. You can forward the CDC events to a event queue like Kafka etc, and let Spark subscribe to one of those topics.
Mule soft probably works too, but honestly as you already mentioned, it is overpriced.
What is presented here was already possible in many other systems, but is now also added in Databricks.
โ03-21-2024 11:34 AM
while this works great with new files is it possible to trigger when update happens to existing file?
โ03-25-2024 08:16 AM
the event triggers on file events in blob storage, which are typically immutable, meaning files cannot be updated, only created or deleted, overwritten.
โ03-25-2024 11:29 AM
Yes, the file is getting overwritten, but trigger is not happening. Maybe I am missing something?
โ03-26-2024 06:06 AM
probably the event is not triggered by an overwrite, can you test with delete followed by a created?
โ04-26-2024 03:39 AM
For reference, the trigger will not contain any information on the event itself (like file names etc), so you cannot build a dynamic event-driven architecture with this trigger.
โ04-26-2024 04:03 AM
@adriennn
That's because it's only one of the trigger types. To load newly arrived files automatically you can utilize AutoLoader.
โ04-26-2024 05:59 AM
@daniel_sahal I get your point, but if for a scheduled trigger you can get all kind of attributes on the trigger time (arguably, this is available for all the triggers), then why wouldn't the most important attribute of a file event not be available through the trigger?
What I'm thinking is something like:
job.trigger.file_arrival.file_path, job.trigger.file_arrival.parent_folder, etc.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group