<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do you think continuing to use instance profile to S3 multi part upload? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116419#M45308</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/93088"&gt;@Yuki&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":cross_mark:"&gt;❌&lt;/span&gt;&lt;STRONG&gt; Not currently — Unity Catalog Volumes do not natively support multipart upload via AWS SD&lt;/STRONG&gt;K&lt;BR /&gt;Unity Catalog Volumes are Databricks-managed paths in S3 (or ADLS) accessed through Unity Catalog governance.&lt;BR /&gt;You can’t use low-level AWS SDK multipart APIs directly with Volumes (e.g., boto3.client('s3').upload_part(...)) because:&lt;BR /&gt;They don’t expose raw bucket paths.&lt;BR /&gt;They wrap access via workspace paths like volume://catalog.schema.volume/path/.&lt;BR /&gt;Unity Catalog volumes are intended for managed data access via Spark and file I/O, not for direct S3 multipart SDK operations.&lt;BR /&gt;Unity Catalog enforces fine-grained access control, so raw multipart access bypassing Databricks governance isn't supported.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; &lt;STRONG&gt;Can You Still Use Instance Profiles for Multipart Uploads?&lt;/STRONG&gt;&lt;BR /&gt;&lt;span class="lia-unicode-emoji" title=":heavy_check_mark:"&gt;✔️&lt;/span&gt; Yes, with caution&lt;BR /&gt;While Databricks recommends moving away from instance profiles, they are still supported for use cases like yours where low-level AWS SDK access is required (e.g., multipart upload, boto3-based apps).&lt;BR /&gt;Just be sure to follow least privilege IAM practices, and isolate access to only the S3 buckets involved.&lt;BR /&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Databricks' official docs confirm:&lt;BR /&gt;“Instance profiles are still supported but should be used for specific, advanced access cases.”&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Better Practice (If Unity Catalog Adoption is Your Goal)&lt;/STRONG&gt;&lt;BR /&gt;If you want to align more with Unity Catalog + credential passthrough, here's a hybrid approach:&lt;/P&gt;&lt;P&gt;1. &lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Use Spark or DBFS for most volume-based data writes&lt;BR /&gt;Unity Catalog Volumes:&lt;BR /&gt;spark.write.csv("volume://my_catalog.my_schema.my_volume/my_table")&lt;/P&gt;&lt;P&gt;2. For multipart upload, use instance profile + boto3 in a secured job&lt;BR /&gt;Keep a specific job or notebook that:&lt;BR /&gt;Uses boto3 with credentials injected via instance profile&lt;BR /&gt;Uploads directly to raw S3 (outside UC)&lt;BR /&gt;Flags the output to be registered in Unity Catalog later via CREATE TABLE USING LOCATION&lt;/P&gt;</description>
    <pubDate>Thu, 24 Apr 2025 03:04:06 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-04-24T03:04:06Z</dc:date>
    <item>
      <title>How do you think continuing to use instance profile to S3 multi part upload?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116414#M45305</link>
      <description>&lt;P&gt;My team is currently using an instance profile to upload data to S3 since we only have Hive Metastore.&lt;/P&gt;&lt;P&gt;I like Unity Catalog a lot, but my code uses multipart upload to S3 for efficiency.&lt;/P&gt;&lt;P&gt;&lt;A title="https://docs.aws.amazon.com/amazons3/latest/userguide/mpuoverview.html" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html" target="_blank" rel="noreferrer noopener"&gt;https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to continue using it, but I'm unsure of the best practice now because instance profiles are not recommended anymore.&lt;/P&gt;&lt;P&gt;&lt;A title="https://docs.databricks.com/aws/en/admin/workspace-settings/manage-instance-profiles" href="https://docs.databricks.com/aws/en/admin/workspace-settings/manage-instance-profiles" target="_blank" rel="noreferrer noopener"&gt;https://docs.databricks.com/aws/en/admin/workspace-settings/manage-instance-profiles&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is it okay to use it in our case?&lt;/P&gt;&lt;P&gt;Or is there any way to multipart upload data to Volume?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Apr 2025 01:09:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116414#M45305</guid>
      <dc:creator>Yuki</dc:creator>
      <dc:date>2025-04-24T01:09:36Z</dc:date>
    </item>
    <item>
      <title>Re: How do you think continuing to use instance profile to S3 multi part upload?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116419#M45308</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/93088"&gt;@Yuki&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":cross_mark:"&gt;❌&lt;/span&gt;&lt;STRONG&gt; Not currently — Unity Catalog Volumes do not natively support multipart upload via AWS SD&lt;/STRONG&gt;K&lt;BR /&gt;Unity Catalog Volumes are Databricks-managed paths in S3 (or ADLS) accessed through Unity Catalog governance.&lt;BR /&gt;You can’t use low-level AWS SDK multipart APIs directly with Volumes (e.g., boto3.client('s3').upload_part(...)) because:&lt;BR /&gt;They don’t expose raw bucket paths.&lt;BR /&gt;They wrap access via workspace paths like volume://catalog.schema.volume/path/.&lt;BR /&gt;Unity Catalog volumes are intended for managed data access via Spark and file I/O, not for direct S3 multipart SDK operations.&lt;BR /&gt;Unity Catalog enforces fine-grained access control, so raw multipart access bypassing Databricks governance isn't supported.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; &lt;STRONG&gt;Can You Still Use Instance Profiles for Multipart Uploads?&lt;/STRONG&gt;&lt;BR /&gt;&lt;span class="lia-unicode-emoji" title=":heavy_check_mark:"&gt;✔️&lt;/span&gt; Yes, with caution&lt;BR /&gt;While Databricks recommends moving away from instance profiles, they are still supported for use cases like yours where low-level AWS SDK access is required (e.g., multipart upload, boto3-based apps).&lt;BR /&gt;Just be sure to follow least privilege IAM practices, and isolate access to only the S3 buckets involved.&lt;BR /&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Databricks' official docs confirm:&lt;BR /&gt;“Instance profiles are still supported but should be used for specific, advanced access cases.”&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Better Practice (If Unity Catalog Adoption is Your Goal)&lt;/STRONG&gt;&lt;BR /&gt;If you want to align more with Unity Catalog + credential passthrough, here's a hybrid approach:&lt;/P&gt;&lt;P&gt;1. &lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Use Spark or DBFS for most volume-based data writes&lt;BR /&gt;Unity Catalog Volumes:&lt;BR /&gt;spark.write.csv("volume://my_catalog.my_schema.my_volume/my_table")&lt;/P&gt;&lt;P&gt;2. For multipart upload, use instance profile + boto3 in a secured job&lt;BR /&gt;Keep a specific job or notebook that:&lt;BR /&gt;Uses boto3 with credentials injected via instance profile&lt;BR /&gt;Uploads directly to raw S3 (outside UC)&lt;BR /&gt;Flags the output to be registered in Unity Catalog later via CREATE TABLE USING LOCATION&lt;/P&gt;</description>
      <pubDate>Thu, 24 Apr 2025 03:04:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116419#M45308</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-04-24T03:04:06Z</dc:date>
    </item>
    <item>
      <title>Re: How do you think continuing to use instance profile to S3 multi part upload?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116441#M45315</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thank you for your excellent response. I really appreciated it.&lt;/P&gt;&lt;P&gt;I couldn't find the mention that says "Instance profiles are still supported but should be used for specific, advanced access cases." I will use it for now, recognizing that my case is special.&lt;/P&gt;&lt;P&gt;But I also want to migrate UC perfectly. Your Best Practice is helpful for me.&lt;/P&gt;&lt;P&gt;I understand that if we can use Spark and the data format allows it, I will use it fully.&lt;/P&gt;&lt;P&gt;I was deeply moved by how thoughtfully you responded.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Apr 2025 08:28:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-you-think-continuing-to-use-instance-profile-to-s3-multi/m-p/116441#M45315</guid>
      <dc:creator>Yuki</dc:creator>
      <dc:date>2025-04-24T08:28:06Z</dc:date>
    </item>
  </channel>
</rss>

