Databricks Community

Yuki · ‎04-23-2025

My team is currently using an instance profile to upload data to S3 since we only have Hive Metastore.

I like Unity Catalog a lot, but my code uses multipart upload to S3 for efficiency.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

I want to continue using it, but I'm unsure of the best practice now because instance profiles are not recommended anymore.

https://docs.databricks.com/aws/en/admin/workspace-settings/manage-instance-profiles

Is it okay to use it in our case?

Or is there any way to multipart upload data to Volume?

Thank you.

lingareddy_Alva · ‎04-23-2025

Hi @Yuki

❌ Not currently — Unity Catalog Volumes do not natively support multipart upload via AWS SDK
Unity Catalog Volumes are Databricks-managed paths in S3 (or ADLS) accessed through Unity Catalog governance.
You can’t use low-level AWS SDK multipart APIs directly with Volumes (e.g., boto3.client('s3').upload_part(...)) because:
They don’t expose raw bucket paths.
They wrap access via workspace paths like volume://catalog.schema.volume/path/.
Unity Catalog volumes are intended for managed data access via Spark and file I/O, not for direct S3 multipart SDK operations.
Unity Catalog enforces fine-grained access control, so raw multipart access bypassing Databricks governance isn't supported.

✅ Can You Still Use Instance Profiles for Multipart Uploads?
✔️ Yes, with caution
While Databricks recommends moving away from instance profiles, they are still supported for use cases like yours where low-level AWS SDK access is required (e.g., multipart upload, boto3-based apps).
Just be sure to follow least privilege IAM practices, and isolate access to only the S3 buckets involved.
✅ Databricks' official docs confirm:
“Instance profiles are still supported but should be used for specific, advanced access cases.”

Better Practice (If Unity Catalog Adoption is Your Goal)
If you want to align more with Unity Catalog + credential passthrough, here's a hybrid approach:

1. ✅ Use Spark or DBFS for most volume-based data writes
Unity Catalog Volumes:
spark.write.csv("volume://my_catalog.my_schema.my_volume/my_table")

2. For multipart upload, use instance profile + boto3 in a secured job
Keep a specific job or notebook that:
Uses boto3 with credentials injected via instance profile
Uploads directly to raw S3 (outside UC)
Flags the output to be registered in Unity Catalog later via CREATE TABLE USING LOCATION

LR

View solution in original post

lingareddy_Alva · ‎04-23-2025

Hi @Yuki

❌ Not currently — Unity Catalog Volumes do not natively support multipart upload via AWS SDK
Unity Catalog Volumes are Databricks-managed paths in S3 (or ADLS) accessed through Unity Catalog governance.
You can’t use low-level AWS SDK multipart APIs directly with Volumes (e.g., boto3.client('s3').upload_part(...)) because:
They don’t expose raw bucket paths.
They wrap access via workspace paths like volume://catalog.schema.volume/path/.
Unity Catalog volumes are intended for managed data access via Spark and file I/O, not for direct S3 multipart SDK operations.
Unity Catalog enforces fine-grained access control, so raw multipart access bypassing Databricks governance isn't supported.

✅ Can You Still Use Instance Profiles for Multipart Uploads?
✔️ Yes, with caution
While Databricks recommends moving away from instance profiles, they are still supported for use cases like yours where low-level AWS SDK access is required (e.g., multipart upload, boto3-based apps).
Just be sure to follow least privilege IAM practices, and isolate access to only the S3 buckets involved.
✅ Databricks' official docs confirm:
“Instance profiles are still supported but should be used for specific, advanced access cases.”

Better Practice (If Unity Catalog Adoption is Your Goal)
If you want to align more with Unity Catalog + credential passthrough, here's a hybrid approach:

1. ✅ Use Spark or DBFS for most volume-based data writes
Unity Catalog Volumes:
spark.write.csv("volume://my_catalog.my_schema.my_volume/my_table")

2. For multipart upload, use instance profile + boto3 in a secured job
Keep a specific job or notebook that:
Uses boto3 with credentials injected via instance profile
Uploads directly to raw S3 (outside UC)
Flags the output to be registered in Unity Catalog later via CREATE TABLE USING LOCATION

LR

Yuki · ‎04-24-2025

Hi @lingareddy_Alva ,

Thank you for your excellent response. I really appreciated it.

I couldn't find the mention that says "Instance profiles are still supported but should be used for specific, advanced access cases." I will use it for now, recognizing that my case is special.

But I also want to migrate UC perfectly. Your Best Practice is helpful for me.

I understand that if we can use Spark and the data format allows it, I will use it fully.

I was deeply moved by how thoughtfully you responded.

Databricks Community

How do you think continuing to use instance profile to S3 multi part upload?

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples