cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Add the creation date of a parquet file into a DataFrame

wyzer
Contributor II

Currently I load multiple parquet file with this code:

df = spark.read.parquet("/mnt/dev/bronze/Voucher/*/*")

(Inside the Voucher folder, there is one folder by date. Each one containing one parquet file)

How can I add a column into this DataFrame, that contains the creation date of each parquet file ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

MichailKaramano
Contributor

Hi,

You can use the file metadata column: https://docs.databricks.com/ingestion/file-metadata-column.html

This way you can access the file_path, file_name, file_size and file_modification_time of the data file from the corresponding dataframe row. No need to do it manually!

I found it useful ๐Ÿ™‚

View solution in original post

2 REPLIES 2

MichailKaramano
Contributor

Hi,

You can use the file metadata column: https://docs.databricks.com/ingestion/file-metadata-column.html

This way you can access the file_path, file_name, file_size and file_modification_time of the data file from the corresponding dataframe row. No need to do it manually!

I found it useful ๐Ÿ™‚

wyzer
Contributor II

Thanks @Michail Karamanosโ€‹