Databricks Community

ruoyuqian · ‎07-29-2024

How to programmatically upload parquet files from Azure data lake to Catalog's Volumes?

source_path = "abfss://datalake-raw-dev@xxx.dfs.core.windows.net/xxxxx/saxxles/xx/source/ETL/transformed_data/parquet/"

# Define the path to your Unity Catalog Volume
destination_path = "dbfs:/Volumes/xxx/xxx/transformed_parquet"

# Read the Parquet files from the source into a DataFrame
df = spark.read.parquet(source_path)
print('so far okay')
# Write the DataFrame to the Unity Catalog Volume
df.write.mode("overwrite").parquet(destination_path)

print(f"Data successfully copied to {destination_path}")

I try the method above but it says I cannot access Volume this way, how to programmatically do it without using the UI

Ajay-Pandey · ‎07-29-2024

Hi @ruoyuqian

Please use dbutils.fs.cp(sourcePath,destination_path) that will be able to load data in volume.

If still having issue, please check for access of running via job.

Ajay Kumar Pandey

Witold · ‎07-29-2024

Besides, when accessing volumes, you don't need to provide dbfs protocol: `/Volumes/xxx/xxx/transformed_parquet`

Databricks Community

Upload to Volume

Join Us as a Local Community Builder!

🚀 Weekly Delta (1 - 7 October): A Look Back at This Week’s Top Community Highlights!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

Announcing Data Intelligence for Cybersecurity