โ02-06-2024 06:19 AM
I configured an autoloader in file notification mode to get files from S3 on AWS.
spark.readStream\
.format("cloudFiles")\
.option("cloudFiles.format", "json")\
.option("cloudFiles.inferColumnTypes", "true")\
.option("cloudFiles.schemaLocation", "dbfs:/auto-loader/schemas/")\
.option("cloudFiles.useNotifications", "true")\
.option("includeExistingFiles", "true")\
.option("multiLine", "true")\
.option("inferSchema", "true")\
.load("s3://orcus-rave-bucket/temp/cludcad_incident3")\
.writeStream\
.option("checkpointLocation", "dbfs:/auto-loader/checkpoint04/")\
.trigger(availableNow=True)\
.table("al_table3")
I configured IAM rule based on the document URL: https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/file-notification-mode#--re...
โ02-06-2024 06:21 AM
Here is an error:
I get the following error:
com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [BasicAWSCredentialsProvider: Access key or secret key is null, com.amazonaws.auth.InstanceProfileCredentialsProvider@5ab84d23: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/]
โ03-08-2024 06:20 AM
I have the same error after adding the IAM permissions noted in the file notification mode documentation. Were you able to find a solution?
โ03-20-2024 08:17 AM
In case anyone else stumbles across this, I was able to fix my issue by setting up an instance profile with the file notification permissions and attaching the instance profile to the job cluster. It wasn't clear from the documentation that the file notification permissions can't be set up with a role and job using storage credentials. This article helped: https://medium.com/@mattwinmill88/deploying-a-databricks-aws-end-to-end-pipeline-using-terraform-921...
โ11-20-2024 11:05 PM
Hi @Selz ,
I currently have the same error when running autoloader on file notification mode. I have done the following steps:
1. setup instance profile with file notification permissions
2. added the instance profile on databricks workspace , settings->security-> instance profiles
3. configured the job compute policy to add the config
"aws_attributes.instance_profile_arn": {
"type": "allowlist",
"values": [
"arn:aws:iam::<account_id>:instance-profile/<my instance profile role>"
],
"isOptional": true
},however I'm still getting the same error. wondering if I'm I did something wrong or missed a step. I appreciate your guidance on this.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now