Databricks Community

karolinalbinsso · ‎05-03-2022

I have created a job that contains a notebook that reads a file from Azure Storage.

The file-name contains the date of when the file was transferred to the storage. A new file arrives every Monday, and the read-job is scheduled to run every Monday.

In my notebook, I want to use the schedule-date of the job to read the file from Azure Storage with the same date in the filename, something like this:

file_location = ("file_name+"_"+job_date+_+country_id+.csv")

I have tried to pass a date as a parameter and I am able to access that from the notebook, but if the job fails and I want to re-run the job the next day, I'd have to manually enter yesterdays date as the input parameter. I want to avoid this and just use the real scheduling date for the job.

How do I access the job scheduling date from within the notebook?

Thanks in advance

Karolin

Hubert-Dudek · ‎05-03-2022

Hi, I guess the files are in the same directory structure so that you can use cloud files autoloader. It will incrementally read only new files https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader

So it will be another way around, so you can take the date from the input file using.:

.withColumn("filePath",input_file_name())

View solution in original post

Hubert-Dudek · ‎05-03-2022

Hi, I guess the files are in the same directory structure so that you can use cloud files autoloader. It will incrementally read only new files https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader

So it will be another way around, so you can take the date from the input file using.:

.withColumn("filePath",input_file_name())

Deepak010101 · ‎09-06-2024

@Kani Yes...I have similar use case where i run a sql query with filter start date and end_date and job has to run in every 10days

Current run > select * from table where start_date > 01-01-2024 and end_date < 01-10-24

Now if job is succsful in nect run it should pick > Select * from table where start_date > 01-10-24 and end_date < 01-20-24

Workflow should automatically take these dates on execution

Databricks Community

How to access the job-Scheduling Date from within the notebook?

Photos

Join Us as a Local Community Builder!

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.