Databricks job trigger in specific times
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2024 06:53 AM
Hello,
I have a Databricks notebook that processes data and generates a list of JSON objects called "list_json". Each JSON object contains an item called "time_to_send" (in UTC datetime format). I want to find the best way to send these JSON messages in a POST request within 1 hour before the "time_to_send". What is the best approach to achieve this?
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2024 08:13 AM
Hi @dbx_deltaSharin ,
You can write python function that will consume this list_json as argument and send post request for each object inside list. Since you need to send request within an hour you can use python multiprocessing or asyncio library to make it faster.
But it depends of how many objects you have in your list etc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2024 01:03 PM
Hi @dbx_deltaSharin ,
Additionally to @szymon_dybczak , if you're using Azure, you might consider an architecture where, instead of sending the request directly to your API, you send a message to an Azure Queue or Service Bus. Then, an Azure Function with a Queue Trigger can pick up the message and send it to the API. This approach enhances scalability and reliability because Azure Functions can process multiple requests concurrently and scale automatically based on demand. This can be achieved with other cloud providers as they offer similar services.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2024 11:30 PM
Hi everyone,
Thank you for your responses to my question.
@szymon_dybczak, if I understood correctly, your suggestion is based on running the Databricks job in continuous mode. However, this might incur significant costs if the cluster is running every hour.
@filipniziol, your proposal seems like a viable solution. I would just like to get a clearer idea of the associated costs to be able to compare the two options.
For clarification, the initial notebook is designed to run once a day to update and compute the JSON list. Another notebook is needed to process this JSON data and handle the post-processing, starting one hour before the "time_to_send."

