AWS Instance Profiles and DLT Pipelines

youngchef
New Contributor

Hey everyone!

I'm building a DLT pipeline that reads files from S3 (or tries to) and then writes them into different directories in my s3 bucket. The problem is I usually access S3 with an instance profile attached to a cluster, but DLT does not give me the option to use an instance profile for the job cluster it creates.

What is the solution here? Do I somehow have to pass my AWS keys in the DLT notebook?

Prabakar
Databricks Employee
Databricks Employee

hi @Quinn Harty​ If you need an instance profile or other configuration to access your storage location, specify it for both the default cluster and the maintenance cluster.

https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-configuration.html#...

Hubert-Dudek
Databricks MVP
{
  "clusters": [
    {
      "label": "default",
      "aws_attributes": {
        "instance_profile_arn": "arn:aws:..."
      }
    },
    {
      "label": "maintenance",
      "aws_attributes": {
        "instance_profile_arn": "arn:aws:..."
      }
    }
  ]
}


My blog: https://databrickster.medium.com/

View solution in original post

Hi @Quinn Harty​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.