Databricks Community

wyzer · ‎11-18-2022

Currently I load multiple parquet file with this code:

df = spark.read.parquet("/mnt/dev/bronze/Voucher/*/*")

(Inside the Voucher folder, there is one folder by date. Each one containing one parquet file)

How can I add a column into this DataFrame, that contains the creation date of each parquet file ?

Thanks

MichailKaramano · ‎11-18-2022

Hi,

You can use the file metadata column: https://docs.databricks.com/ingestion/file-metadata-column.html

This way you can access the file_path, file_name, file_size and file_modification_time of the data file from the corresponding dataframe row. No need to do it manually!

I found it useful 🙂

View solution in original post

MichailKaramano · ‎11-18-2022

Hi,

You can use the file metadata column: https://docs.databricks.com/ingestion/file-metadata-column.html

This way you can access the file_path, file_name, file_size and file_modification_time of the data file from the corresponding dataframe row. No need to do it manually!

I found it useful 🙂