cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Message queue to directly trigger job

AmandaOhare
New Contributor II

Hi All,

I'm very new to Databricks and trying to understand my use case a bit better.

I have a databricks script / job that I want to be reactive to events outside of my databricks environment. A best case scenario would be if my script / job could automatically react to some cloud hosted queue like pub/sub or SNS (or better yet some native databricks solution). Ie the compute / serverless solution would only start up as a reaction to an outside event like a message placed on a queue.

I've read a bit about Streaming (https://docs.databricks.com/en/connect/streaming/index.html) but this seems to just be a convenient way to read off of queues once my job is already running vs a way to trigger that start of my job. Importantly I can't afford to keep a cluster running 24/7 when most of the time outside events will not be occurring.

I've looked a bit into serverless (https://docs.databricks.com/en/compute/serverless/index.html) but that seems more about ease of management and configuration rather than a solution to my use case.

The two best solutions I can think of are:
1) Having an external application trigger the start of my job / script / notebook via the databricks API
2) Having an external application trigger the start of my job / script / notebook via the databricks SDK

If there is a better practice solution for my use case please do let me know.

Best Regards

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Contributor III

Hi @AmandaOhare ,

You can use AWS Lambda to achieve that. You can setup queue trigger that will activate AWS lambda function. In that function you can call datbricks rest API that will launch workflow/job.

View solution in original post

3 REPLIES 3

szymon_dybczak
Contributor III

Hi @AmandaOhare ,

You can use AWS Lambda to achieve that. You can setup queue trigger that will activate AWS lambda function. In that function you can call datbricks rest API that will launch workflow/job.

Hi @szymon_dybczak ,

Is that to say there is no native solution in the databricks echo system to respond to outside events (messages or http etc) and that it would be mandatory to have some outside infrastructure to trigger workflow / job via this external architecture (AWS lambda using the api or sdk as an example).

Hi @AmandaOhare ,

Yes, exactly. To trigger workflow based on some kind of event you have to use cloud native solution like Azure Functions in case of Azure or AWS lamdbda.
Another approach is to use structured streaming in continous mode. Then if something is put on queue ,spark will automatically consume it. But that has major drawback - your cluster needs to constantly run.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group