ETL pipeline
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
β04-18-2025 03:43 AM
I have an ETL pipeline in workflows which I am using to create materialized view. I want to schedule the pipeline for 10 hours only starting from 10 am. How can I schedule that? I can only see hourly basis schedule or cron syntax. I want the compute to be up for 10 hours and then terminate.
Thanks
Yogesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
β04-18-2025 04:03 AM
Databricks doesn't support duration-based schedules directly, but you can simulate this using cron syntax.
Use This Cron Expression : 0 10-19 * * *
To ensure compute is not running outside of these hours Set Auto-Termination to a low value like 15 mins
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
β04-18-2025 05:37 AM
Hey @Yunky007
You should use the cron expression 0 10 * * * to start the process at 10 AM.
Then, inside your script, implement a loop or mechanism that keeps the logic running for 10 hours, thatβs the trick.
import time
from datetime import datetime, timedelta
start_time = datetime.now()
end_time = start_time + timedelta(hours=10)
while datetime.now() < end_time:
# Logic
spark.sql("REFRESH MATERIALIZED VIEW my_catalog.my_schema.my_mv")
# Wait time between executions
time.sleep(60 * 60) # 3600 secs = 1 h
Hope this helps π
Isi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
β05-05-2025 04:12 AM
Use cron syntax with a stop condition after 10 hours runtime.