HI,
I have a daily scheduled job which processes the data and write as parquet file in a specific folder structure like root_folder/{CountryCode}/parquetfiles. Where each day job will write new data for countrycode under the folder for countrycode
I am trying to achieve this by using
dataframe.partitionBy("countryCode").write.parquet(root_Folder)
this is creation a folder structure like
root_folder/countryCode=x/part1-snappy.parquet
root_folder/countryCode=x/part2-snappy.parquet
root_folder/countryCode=y/part1-snappy.parquet
but the coutryCode column is removed from the parquet file.
In my case the parquet file is to be read by external consumers and they expect the coutryCode column in file.
Is there an option to have the column in the file and also in folder path.