cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks job trigger in specific times

dbx_deltaSharin
New Contributor II

Hello,

I have a Databricks notebook that processes data and generates a list of JSON objects called "list_json". Each JSON object contains an item called "time_to_send" (in UTC datetime format). I want to find the best way to send these JSON messages in a POST request within 1 hour before the "time_to_send". What is the best approach to achieve this?

Thank you.

3 REPLIES 3

szymon_dybczak
Contributor III

Hi @dbx_deltaSharin ,

You can write python function that will consume this list_json as argument and send post request for each object inside list. Since you need to send request within an hour you can use python multiprocessing or asyncio library to make it faster. 

But it depends of how many objects you have in your list etc 

filipniziol
Contributor III

Hi @dbx_deltaSharin ,

Additionally to @szymon_dybczak , if you're using Azure, you might consider an architecture where, instead of sending the request directly to your API, you send a message to an Azure Queue or Service Bus. Then, an Azure Function with a Queue Trigger can pick up the message and send it to the API. This approach enhances scalability and reliability because Azure Functions can process multiple requests concurrently and scale automatically based on demand. This can be achieved with other cloud providers as they offer similar services.

dbx_deltaSharin
New Contributor II

Hi everyone,

Thank you for your responses to my question.

@szymon_dybczak, if I understood correctly, your suggestion is based on running the Databricks job in continuous mode. However, this might incur significant costs if the cluster is running every hour.

@filipniziol, your proposal seems like a viable solution. I would just like to get a clearer idea of the associated costs to be able to compare the two options.

For clarification, the initial notebook is designed to run once a day to update and compute the JSON list. Another notebook is needed to process this JSON data and handle the post-processing, starting one hour before the "time_to_send."

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group