cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ETL pipeline

Yunky007
New Contributor

I have an ETL pipeline in workflows which I am using to create materialized view. I want to schedule the pipeline for 10 hours only starting from 10 am. How can I schedule that? I can only see hourly basis schedule or cron syntax. I want the compute to be up for 10 hours and then terminate.

Thanks

Yogesh

3 REPLIES 3

tltharani
New Contributor II

Databricks doesn't support duration-based schedules directly, but you can simulate this using cron syntax.
Use This Cron Expression : 0 10-19 * * *
To ensure compute is not running outside of these hours Set Auto-Termination to a low value like 15 mins

 

Isi
Contributor III

Hey @Yunky007 

You should use the cron expression 0 10 * * * to start the process at 10 AM.
Then, inside your script, implement a loop or mechanism that keeps the logic running for 10 hours, thatโ€™s the trick.

 

import time
from datetime import datetime, timedelta

start_time = datetime.now()
end_time = start_time + timedelta(hours=10)

while datetime.now() < end_time:
    # Logic
    spark.sql("REFRESH MATERIALIZED VIEW my_catalog.my_schema.my_mv")

    # Wait time between executions
    time.sleep(60 * 60)  # 3600 secs = 1 h

 

 Hope this helps ๐Ÿ™‚

Isi

KaelaniBraster
New Contributor II

Use cron syntax with a stop condition after 10 hours runtime.