โ05-05-2025 05:42 PM
Hullo good Databricks people.
I have a small dedicated cluster being used for Direct Query (PBI) which has a long termination period. I'd like for it to only be active during business hours though, and to set a restriction so that it's not possible to spin it up outside those hours. Does anyone know whether this is possible? Or whether it's possible to create a sort of hybrid solution so that outside business hours, if someone triggers the activation of the cluster by using the report, it then utilises a serverless option instead?
Thank you in advance,
Johan.
โ05-05-2025 06:19 PM - edited โ05-05-2025 06:20 PM
Hi @jar
Your scenario is quite common - managing costs while ensuring availability during business hours for Power BI Direct Query. Let me share some options based on my experience:
Scheduled Cluster Policies
The most straightforward approach would be to use Databricks Jobs to control your cluster lifecycle:
Hybrid Solution with Serverless Fallback
For your hybrid approach idea (which is clever!), you'd need to implement something like:
Alternative: Optimize for Availability and Cost
Rather than preventing usage, another approach I've seen work well:
This optimizes for both cost and availability without requiring users to switch connection types.
โ05-18-2025 10:44 PM
So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.
โ05-05-2025 06:19 PM - edited โ05-05-2025 06:20 PM
Hi @jar
Your scenario is quite common - managing costs while ensuring availability during business hours for Power BI Direct Query. Let me share some options based on my experience:
Scheduled Cluster Policies
The most straightforward approach would be to use Databricks Jobs to control your cluster lifecycle:
Hybrid Solution with Serverless Fallback
For your hybrid approach idea (which is clever!), you'd need to implement something like:
Alternative: Optimize for Availability and Cost
Rather than preventing usage, another approach I've seen work well:
This optimizes for both cost and availability without requiring users to switch connection types.
โ05-06-2025 10:05 PM
Thank you, @lingareddy_Alva, for taking the time for this thorough response, it is much appreciated.
I have already defined a scheduled job which terminates the cluster post office hours, but this is no guarantee that a user won't have a look at the report thus activating it again after that (those) scheduled termination(s). I like the idea of creating a policy which prevents restarts outside certain hours, or the idle shutdown suggestion of yours, but how do you do that? Looking at the policies in the documentation, I can't see that such things are possible?
โ05-07-2025 10:37 AM
You're right to be concerned about users reactivating the cluster after your scheduled termination job. Looking more closely at Databricks cluster policies, there's no direct way to restrict cluster activation based on time of day - this is a limitation in the current Databricks platform.
While you can't directly set time restrictions in policies, you can implement a workaround:
{ "spark_conf.spark.databricks.cluster.profile": { "type": "fixed", "value": "singleNode" }, "custom_tags.BusinessHours": { "type": "fixed", "value": "true" }, "init_scripts": { "type": "fixed", "value": [{"dbfs": {"destination": "dbfs:/scripts/business-hours-check.sh"}}] } }
2. Create an init script that checks the current time and terminates if outside business hours:
#!/bin/bash # Define business hours (in 24hr format) START_HOUR=8 END_HOUR=18 # Get current hour CURRENT_HOUR=$(date +%H) CURRENT_DAY=$(date +%u) # 1-5 is Monday-Friday # Check if outside business hours (or weekend) if [ $CURRENT_DAY -gt 5 ] || [ $CURRENT_HOUR -lt $START_HOUR ] || [ $CURRENT_HOUR -ge $END_HOUR ]; then echo "Outside business hours. Terminating cluster." # Signal termination through the API curl -X POST -H "Authorization: Bearer $DATABRICKS_TOKEN" \ https://<databricks-instance>/api/2.0/clusters/delete \ -d "{\"cluster_id\": \"$DATABRICKS_CLUSTER_ID\"}" fi
โ05-18-2025 10:44 PM
So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now