- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-05-2025 05:42 PM
Hullo good Databricks people.
I have a small dedicated cluster being used for Direct Query (PBI) which has a long termination period. I'd like for it to only be active during business hours though, and to set a restriction so that it's not possible to spin it up outside those hours. Does anyone know whether this is possible? Or whether it's possible to create a sort of hybrid solution so that outside business hours, if someone triggers the activation of the cluster by using the report, it then utilises a serverless option instead?
Thank you in advance,
Johan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-05-2025 06:19 PM - edited 05-05-2025 06:20 PM
Hi @jar
Your scenario is quite common - managing costs while ensuring availability during business hours for Power BI Direct Query. Let me share some options based on my experience:
Scheduled Cluster Policies
The most straightforward approach would be to use Databricks Jobs to control your cluster lifecycle:
- Create termination/startup jobs:
- Set up a job to start your cluster shortly before business hours (e.g., 8:45 AM)
- Configure another job to terminate the cluster after hours (e.g., 6:15 PM)
- These can be scheduled to run only on weekdays
- Cluster policies:
- You can define a cluster policy that prevents cluster creation or restart during specific hours
- This would prevent users from manually starting the cluster outside your defined window
Hybrid Solution with Serverless Fallback
For your hybrid approach idea (which is clever!), you'd need to implement something like:
- Create two separate datasets in Power BI:
- One configured to use your dedicated cluster (for business hours)
- Another configured to use Serverless SQL (for after hours)
- Implement routing logic:
- You could use Power BI parameters or a custom Power BI gateway configuration
- Create a simple date/time check that routes queries to the appropriate endpoint based on time
- Automated switching:
- Use the Databricks API with a scheduled Azure Function to modify connection settings in your Power BI dataset based on time of day
Alternative: Optimize for Availability and Cost
Rather than preventing usage, another approach I've seen work well:
- Auto-scaling cluster configuration:
- Set minimum workers to 0 during non-business hours
- Use a higher min/max worker count during business hours
- This allows the cluster to remain available but scale down aggressively when not in use
- Implement idle shutdown:
- Configure a much shorter auto-termination window outside business hours (e.g., 10 minutes)
- Use a longer termination window during business hours
This optimizes for both cost and availability without requiring users to switch connection types.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2025 10:05 PM
Thank you, @lingareddy_Alva, for taking the time for this thorough response, it is much appreciated.
I have already defined a scheduled job which terminates the cluster post office hours, but this is no guarantee that a user won't have a look at the report thus activating it again after that (those) scheduled termination(s). I like the idea of creating a policy which prevents restarts outside certain hours, or the idle shutdown suggestion of yours, but how do you do that? Looking at the policies in the documentation, I can't see that such things are possible?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2025 10:37 AM
You're right to be concerned about users reactivating the cluster after your scheduled termination job. Looking more closely at Databricks cluster policies, there's no direct way to restrict cluster activation based on time of day - this is a limitation in the current Databricks platform.
While you can't directly set time restrictions in policies, you can implement a workaround:
- Create a cluster policy with an init script requirement:
{ "spark_conf.spark.databricks.cluster.profile": { "type": "fixed", "value": "singleNode" }, "custom_tags.BusinessHours": { "type": "fixed", "value": "true" }, "init_scripts": { "type": "fixed", "value": [{"dbfs": {"destination": "dbfs:/scripts/business-hours-check.sh"}}] } }
2. Create an init script that checks the current time and terminates if outside business hours:
#!/bin/bash # Define business hours (in 24hr format) START_HOUR=8 END_HOUR=18 # Get current hour CURRENT_HOUR=$(date +%H) CURRENT_DAY=$(date +%u) # 1-5 is Monday-Friday # Check if outside business hours (or weekend) if [ $CURRENT_DAY -gt 5 ] || [ $CURRENT_HOUR -lt $START_HOUR ] || [ $CURRENT_HOUR -ge $END_HOUR ]; then echo "Outside business hours. Terminating cluster." # Signal termination through the API curl -X POST -H "Authorization: Bearer $DATABRICKS_TOKEN" \ https://<databricks-instance>/api/2.0/clusters/delete \ -d "{\"cluster_id\": \"$DATABRICKS_CLUSTER_ID\"}" fi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2025 10:44 PM
So the hybrid solution turned out to be the smartest, at least for my case, with a simple time check that defines the connection information according to whether the query is sent outside business hours or not.