cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoloader file notification mode error using UC

rimaissa
New Contributor II

We have a DLT pipeline we've created that is using autoloader file notification mode. The pipeline ran fine before moving it to UC. Now that we're using UC, we are getting an AWS permissions issue when the autoloader file notification mode is set to true. We've checked all of our AWS permissions and everything is configured correctly. We do not want to parameterize our accesskey or secret keys. Have there been any updates about this? 

 

@dlt.table(
table_properties={
"quality": "bronze",
"pipelines.autoOptimize.managed":"true",
}
)
def bronze():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "avro")
.option("cloudFiles.useNotifications", "true")
.option("cloudFiles.backfillInterval", "1 day")
.option("pathGlobFilter", "*.avro")
.option("avroSchema", avro_schema) 
.load(data_source)
2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @rimaissa

  1. Ensure that the user or service principal running the DLT pipeline has the necessary permissions to access the S3 bucket and set up the required cloud resources (SNS, SQS) in the Unity Catalog context. This may require additional permissions beyond what was needed when not using UC.
  2. Instead of using hardcoded access keys and secret keys, consider using managed identities provided by your cloud provider (e.g., AWS IAM roles) to authenticate and authorize access to the required resources. This can simplify permissions management and avoid the need to parameterize the access keys.
  3. Unity Catalog has its own permissions model that may need to be configured correctly to grant the necessary access. Ensure that the appropriate permissions are set at the catalog, database, or table level to allow the DLT pipeline to access the required resources.
  4. Double-check the Autoloader configuration options, especially the cloudFiles.useNotifications setting, to ensure that it is set correctly and that the necessary permissions are in place for the cloud resources to be created and accessed.
  5. As a workaround, you can try manually configuring the cloud notification and queue services (e.g., AWS SNS and SQS) and providing the queue identifiers to Autoloader, instead of relying on the automatic configuration. This may help bypass any issues with the automatic configuration in the context of Unity Catalog.
  6. If the issues with Autoloader's File Notification mode persist, you may want to consider alternative data ingestion approaches, such as using Spark Structured Streaming or other data integration tools that can work seamlessly with Unity Catalog.

By addressing the permissions management in the context of Unity Catalog and exploring alternative Autoloader configuration options, you should be able to resolve the AWS permissions issue and get your DLT pipeline running smoothly again.

Hi @Kaniz_Fatma, thank you. We are trying to understand if this is an actual issue with using autoloader file notifications in UC since autoloader file notification mode is only supported on single user clusters. Seems like this won't work because of that issue? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group