3 weeks ago
I am providing a custom Docker image to my Databricks/Spark job. I've created the image and uploaded it to our private ECR registry (the URL is `472542229217.dkr.ecr.us-west-2.amazonaws.com/tectonai/mrstevegross-testing:latest`). Based on the docs (https://docs.databricks.com/en/compute/custom-containers.html#launch-your-compute-using-the-api), however, it is unclear to me if my string format is correct, since I'm getting this error at runtime:
Cluster '0121-203731-epwa9mar' was terminated. Reason: INVALID_ARGUMENT (CLIENT_ERROR). Parameters: databricks_error_message:Container setup failed because of an invalid request: Exception when verifying docker container image: Image doesn't exist or invalid credential to pull image from 472542229217.dkr.ecr.us-west-2.amazonaws.com/tectonai/mrstevegross-testing:latest. Stdout: Stderr: time="2025-01-21T20:43:05Z" level=fatal msg="Error parsing image name \"docker://472542229217.dkr.ecr.us-west-2.amazonaws.com/tectonai/mrstevegross-testing:latest\": reading manifest latest in 472542229217.dkr.ecr.us-west-2.amazonaws.com/tectonai/mrstevegross-testing: authentication required"
Note in particular that the error claims "parsing image name 'docker://..."; I'm wondering whether the "docker://" prefix indicates that I mis-specified the URL. Can anyone advise on how to properly format "472542229217.dkr.ecr.us-west-2.amazonaws.com/tectonai/mrstevegross-testing:latest" for use in the API call?
2 weeks ago
Hey!
It seems like your Instance Profile might not have enough privileges to access this ECR. I would recommend updating the policies of the IAM role you are using and ensuring that it includes at least the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GrantECRGeneralAccess",
"Effect": "Allow",
"Action": [
"ecr:GetRegistryPolicy",
"ecr:DescribeRegistry",
"ecr:GetAuthorizationToken"
],
"Resource": "<resource>"
},
{
"Sid": "GrantECRReadWriteAccess",
"Effect": "Allow",
"Action": [
"ecr:DescribeImageScanFindings",
"ecr:GetLifecyclePolicyPreview",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:DescribeImageReplicationStatus",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:ListTagsForResource",
"ecr:ListImages",
"ecr:BatchCheckLayerAvailability",
"ecr:GetRepositoryPolicy",
"ecr:GetLifecyclePolicy",
"ecr:InitiateLayerUpload",
"ecr:SetRepositoryPolicy",
"ecr:PutImageTagMutability",
"ecr:StartImageScan",
"ecr:UploadLayerPart",
"ecr:BatchDeleteImage",
"ecr:CompleteLayerUpload",
"ecr:TagResource",
"ecr:ReplicateImage",
"ecr:PutLifecyclePolicy",
"ecr:PutImageScanningConfiguration",
"ecr:PutImage",
"ecr:UntagResource",
"ecr:StartLifecyclePolicyPreview"
],
"Resource": "<resource>"
}
]
}
After updating the IAM role with the above permissions, make sure the Instance Profile ARN is correctly assigned to your Databricks cluster.
🙂
3 weeks ago
Note: I found additional docs about formatting for AWS ECR here, so, I'm trying that now.
2 weeks ago
Hey!
It seems like your Instance Profile might not have enough privileges to access this ECR. I would recommend updating the policies of the IAM role you are using and ensuring that it includes at least the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GrantECRGeneralAccess",
"Effect": "Allow",
"Action": [
"ecr:GetRegistryPolicy",
"ecr:DescribeRegistry",
"ecr:GetAuthorizationToken"
],
"Resource": "<resource>"
},
{
"Sid": "GrantECRReadWriteAccess",
"Effect": "Allow",
"Action": [
"ecr:DescribeImageScanFindings",
"ecr:GetLifecyclePolicyPreview",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:DescribeImageReplicationStatus",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:ListTagsForResource",
"ecr:ListImages",
"ecr:BatchCheckLayerAvailability",
"ecr:GetRepositoryPolicy",
"ecr:GetLifecyclePolicy",
"ecr:InitiateLayerUpload",
"ecr:SetRepositoryPolicy",
"ecr:PutImageTagMutability",
"ecr:StartImageScan",
"ecr:UploadLayerPart",
"ecr:BatchDeleteImage",
"ecr:CompleteLayerUpload",
"ecr:TagResource",
"ecr:ReplicateImage",
"ecr:PutLifecyclePolicy",
"ecr:PutImageScanningConfiguration",
"ecr:PutImage",
"ecr:UntagResource",
"ecr:StartLifecyclePolicyPreview"
],
"Resource": "<resource>"
}
]
}
After updating the IAM role with the above permissions, make sure the Instance Profile ARN is correctly assigned to your Databricks cluster.
🙂
2 weeks ago
Thanks, that's pretty much what I did; a lot of terraform configuration to get the AWS account set up properly, and now I'm able to tell DBR to load the container. (FWIW, I'm encountering *new* access issues; I started a thread here (https://community.databricks.com/t5/community-platform-discussions/how-to-grant-custom-container-aws...) to deal with them).
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group