ETL pipeline

Yunky007
New Contributor

I have an ETL pipeline in workflows which I am using to create materialized view. I want to schedule the pipeline for 10 hours only starting from 10 am. How can I schedule that? I can only see hourly basis schedule or cron syntax. I want the compute to be up for 10 hours and then terminate.

Thanks

Yogesh

tltharani
Databricks Partner

Databricks doesn't support duration-based schedules directly, but you can simulate this using cron syntax.
Use This Cron Expression : 0 10-19 * * *
To ensure compute is not running outside of these hours Set Auto-Termination to a low value like 15 mins

 

Isi
Honored Contributor III

Hey @Yunky007 

You should use the cron expression 0 10 * * * to start the process at 10 AM.
Then, inside your script, implement a loop or mechanism that keeps the logic running for 10 hours, that’s the trick.

 

import time
from datetime import datetime, timedelta

start_time = datetime.now()
end_time = start_time + timedelta(hours=10)

while datetime.now() < end_time:
    # Logic
    spark.sql("REFRESH MATERIALIZED VIEW my_catalog.my_schema.my_mv")

    # Wait time between executions
    time.sleep(60 * 60)  # 3600 secs = 1 h

 

 Hope this helps πŸ™‚

Isi

KaelaniBraster
New Contributor II

Use cron syntax with a stop condition after 10 hours runtime.