- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-03-2022 02:18 AM
I have created a job that contains a notebook that reads a file from Azure Storage.
The file-name contains the date of when the file was transferred to the storage. A new file arrives every Monday, and the read-job is scheduled to run every Monday.
In my notebook, I want to use the schedule-date of the job to read the file from Azure Storage with the same date in the filename, something like this:
file_location = ("file_name+"_"+job_date+_+country_id+.csv")
I have tried to pass a date as a parameter and I am able to access that from the notebook, but if the job fails and I want to re-run the job the next day, I'd have to manually enter yesterdays date as the input parameter. I want to avoid this and just use the real scheduling date for the job.
How do I access the job scheduling date from within the notebook?
Thanks in advance
Karolin
- Labels:
-
Access
-
Azure Storage
-
Date
-
Job
-
Job scheduling
-
Notebook
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-03-2022 04:58 AM
Hi, I guess the files are in the same directory structure so that you can use cloud files autoloader. It will incrementally read only new files https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader
So it will be another way around, so you can take the date from the input file using.:
.withColumn("filePath",input_file_name())
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-03-2022 04:58 AM
Hi, I guess the files are in the same directory structure so that you can use cloud files autoloader. It will incrementally read only new files https://docs.microsoft.com/en-us/azure/databricks/spark/latest/structured-streaming/auto-loader
So it will be another way around, so you can take the date from the input file using.:
.withColumn("filePath",input_file_name())
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2024 03:38 AM
@Kani Yes...I have similar use case where i run a sql query with filter start date and end_date and job has to run in every 10days
Current run > select * from table where start_date > 01-01-2024 and end_date < 01-10-24
Now if job is succsful in nect run it should pick > Select * from table where start_date > 01-10-24 and end_date < 01-20-24
Workflow should automatically take these dates on execution

