cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Add the creation date of a parquet file into a DataFrame

wyzer
Contributor II

Currently I load multiple parquet file with this code:

df = spark.read.parquet("/mnt/dev/bronze/Voucher/*/*")

(Inside the Voucher folder, there is one folder by date. Each one containing one parquet file)

How can I add a column into this DataFrame, that contains the creation date of each parquet file ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

MichailKaramano
Contributor

Hi,

You can use the file metadata column: https://docs.databricks.com/ingestion/file-metadata-column.html

This way you can access the file_path, file_name, file_size and file_modification_time of the data file from the corresponding dataframe row. No need to do it manually!

I found it useful 🙂

View solution in original post

2 REPLIES 2

MichailKaramano
Contributor

Hi,

You can use the file metadata column: https://docs.databricks.com/ingestion/file-metadata-column.html

This way you can access the file_path, file_name, file_size and file_modification_time of the data file from the corresponding dataframe row. No need to do it manually!

I found it useful 🙂

wyzer
Contributor II

Thanks @Michail Karamanos​ 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.