cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Define time interval for when a cluster can be active

jar
Contributor

Hullo good Databricks people.

I have a small dedicated cluster being used for Direct Query (PBI) which has a long termination period. I'd like for it to only be active during business hours though, and to set a restriction so that it's not possible to spin it up outside those hours. Does anyone know whether this is possible? Or whether it's possible to create a sort of hybrid solution so that outside business hours, if someone triggers the activation of the cluster by using the report, it then utilises a serverless option instead?

Thank you in advance,

Johan.

2 ACCEPTED SOLUTIONS

Accepted Solutions

lingareddy_Alva
Honored Contributor II

Hi @jar 

Your scenario is quite common - managing costs while ensuring availability during business hours for Power BI Direct Query. Let me share some options based on my experience:

Scheduled Cluster Policies

The most straightforward approach would be to use Databricks Jobs to control your cluster lifecycle:

  1. Create termination/startup jobs:
    • Set up a job to start your cluster shortly before business hours (e.g., 8:45 AM)
    • Configure another job to terminate the cluster after hours (e.g., 6:15 PM)
    • These can be scheduled to run only on weekdays
  2. Cluster policies:
    • You can define a cluster policy that prevents cluster creation or restart during specific hours
    • This would prevent users from manually starting the cluster outside your defined window

Hybrid Solution with Serverless Fallback

For your hybrid approach idea (which is clever!), you'd need to implement something like:

  1. Create two separate datasets in Power BI:
    • One configured to use your dedicated cluster (for business hours)
    • Another configured to use Serverless SQL (for after hours)
  2. Implement routing logic:
    • You could use Power BI parameters or a custom Power BI gateway configuration
    • Create a simple date/time check that routes queries to the appropriate endpoint based on time
  3. Automated switching:
    • Use the Databricks API with a scheduled Azure Function to modify connection settings in your Power BI dataset based on time of day

Alternative: Optimize for Availability and Cost

Rather than preventing usage, another approach I've seen work well:

  1. Auto-scaling cluster configuration:
    • Set minimum workers to 0 during non-business hours
    • Use a higher min/max worker count during business hours
    • This allows the cluster to remain available but scale down aggressively when not in use
  2. Implement idle shutdown:
    • Configure a much shorter auto-termination window outside business hours (e.g., 10 minutes)
    • Use a longer termination window during business hours

This optimizes for both cost and availability without requiring users to switch connection types.

LR

View solution in original post

jar
Contributor

So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.

View solution in original post

4 REPLIES 4

lingareddy_Alva
Honored Contributor II

Hi @jar 

Your scenario is quite common - managing costs while ensuring availability during business hours for Power BI Direct Query. Let me share some options based on my experience:

Scheduled Cluster Policies

The most straightforward approach would be to use Databricks Jobs to control your cluster lifecycle:

  1. Create termination/startup jobs:
    • Set up a job to start your cluster shortly before business hours (e.g., 8:45 AM)
    • Configure another job to terminate the cluster after hours (e.g., 6:15 PM)
    • These can be scheduled to run only on weekdays
  2. Cluster policies:
    • You can define a cluster policy that prevents cluster creation or restart during specific hours
    • This would prevent users from manually starting the cluster outside your defined window

Hybrid Solution with Serverless Fallback

For your hybrid approach idea (which is clever!), you'd need to implement something like:

  1. Create two separate datasets in Power BI:
    • One configured to use your dedicated cluster (for business hours)
    • Another configured to use Serverless SQL (for after hours)
  2. Implement routing logic:
    • You could use Power BI parameters or a custom Power BI gateway configuration
    • Create a simple date/time check that routes queries to the appropriate endpoint based on time
  3. Automated switching:
    • Use the Databricks API with a scheduled Azure Function to modify connection settings in your Power BI dataset based on time of day

Alternative: Optimize for Availability and Cost

Rather than preventing usage, another approach I've seen work well:

  1. Auto-scaling cluster configuration:
    • Set minimum workers to 0 during non-business hours
    • Use a higher min/max worker count during business hours
    • This allows the cluster to remain available but scale down aggressively when not in use
  2. Implement idle shutdown:
    • Configure a much shorter auto-termination window outside business hours (e.g., 10 minutes)
    • Use a longer termination window during business hours

This optimizes for both cost and availability without requiring users to switch connection types.

LR

Thank you, @lingareddy_Alva, for taking the time for this thorough response, it is much appreciated.

I have already defined a scheduled job which terminates the cluster post office hours, but this is no guarantee that a user won't have a look at the report thus activating it again after that (those) scheduled termination(s). I like the idea of creating a policy which prevents restarts outside certain hours, or the idle shutdown suggestion of yours, but how do you do that? Looking at the policies in the documentation, I can't see that such things are possible?

lingareddy_Alva
Honored Contributor II

@jar 

You're right to be concerned about users reactivating the cluster after your scheduled termination job. Looking more closely at Databricks cluster policies, there's no direct way to restrict cluster activation based on time of day - this is a limitation in the current Databricks platform.

While you can't directly set time restrictions in policies, you can implement a workaround:

  1. Create a cluster policy with an init script requirement:
     
    {
      "spark_conf.spark.databricks.cluster.profile": {
        "type": "fixed",
        "value": "singleNode"
      },
      "custom_tags.BusinessHours": {
        "type": "fixed",
        "value": "true"
      },
      "init_scripts": {
        "type": "fixed",
        "value": [{"dbfs": {"destination": "dbfs:/scripts/business-hours-check.sh"}}]
      }
    }

     2. Create an init script that checks the current time and terminates if outside business hours:

bash
#!/bin/bash

# Define business hours (in 24hr format)
START_HOUR=8
END_HOUR=18

# Get current hour
CURRENT_HOUR=$(date +%H)
CURRENT_DAY=$(date +%u)  # 1-5 is Monday-Friday

# Check if outside business hours (or weekend)
if [ $CURRENT_DAY -gt 5 ] || [ $CURRENT_HOUR -lt $START_HOUR ] || [ $CURRENT_HOUR -ge $END_HOUR ]; then
  echo "Outside business hours. Terminating cluster."
  # Signal termination through the API
  curl -X POST -H "Authorization: Bearer $DATABRICKS_TOKEN" \
    https://<databricks-instance>/api/2.0/clusters/delete \
    -d "{\"cluster_id\": \"$DATABRICKS_CLUSTER_ID\"}"
fi

 

LR

jar
Contributor

So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now