cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks now supports event-driven workloads, especially for loading cloud files from external locations. This means you can save costs and resource...

Hubert-Dudek
Esteemed Contributor III

Databricks now supports event-driven workloads, especially for loading cloud files from external locations. This means you can save costs and resources by triggering your Databricks jobs only when new files arrive in your cloud storage instead of mounting it as DBFS and polling it periodically. To use this feature, you need to follow these steps:

  • Add an external location for your ADLS2 container,
  • Make sure the storage credentials you use (such as Access Connector, service principal, or managed identity) have Storage Blob Data Contributor permissions for that container,
  • Make sure the account you use to run your workload has at least read files permission for the external location,
  • Write a notebook that loads cloud files from the external location,
  • Set a file arrival trigger for your workflow and specify the exact external location as the source.

With these steps, you can easily create and run event-driven workloads on Databricks.

ezgif-3-946af786d0

9 REPLIES 9

Salesforce
New Contributor II

Hey,

We have a use case where we have Salesforce generating Change Data Capture (CDC) platform events. With this new event driven workload, can Databricks directly consume these CDC events from Salesforce?

We are currently also evaluating a middleware like Mulesoft are directed in this reference article: Subscribe to Change Data Capture Events with the Salesforce Connector. However we are concerned about the pricing of Mulesoft.

-werners-
Esteemed Contributor III

I think we are talking about file events here.
What you are talking about is in fact streaming ingest from a CDC system.  That can be done but not by directly connecting to the CDC.  You can forward the CDC events to a event queue like Kafka etc, and let Spark subscribe to one of those topics.
Mule soft probably works too, but honestly as you already mentioned, it is overpriced.
What is presented here was already possible in many other systems, but is now also added in Databricks.

Floody
New Contributor II

while this works great with new files is it possible to trigger when update happens to existing file?

-werners-
Esteemed Contributor III

the event triggers on file events in blob storage, which are typically immutable, meaning files cannot be updated, only created or deleted, overwritten.

Floody
New Contributor II

Yes, the file is getting overwritten, but trigger is not happening. Maybe I am missing something?

-werners-
Esteemed Contributor III

probably the event is not triggered by an overwrite, can you test with delete followed by a created?

adriennn
Contributor II

For reference, the trigger will not contain any information on the event itself (like file names etc), so you cannot build a dynamic event-driven architecture with this trigger.

daniel_sahal
Esteemed Contributor

@adriennn 
That's because it's only one of the trigger types. To load newly arrived files automatically you can utilize AutoLoader.

adriennn
Contributor II

@daniel_sahal I get your point, but if for a scheduled trigger you can get all kind of attributes on the trigger time (arguably, this is available for all the triggers), then why wouldn't the most important attribute of a file event not be available through the trigger?

What I'm thinking is something like:
job.trigger.file_arrival.file_path, job.trigger.file_arrival.parent_folder,  etc.

adriennn_0-1714136125829.png

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group