cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Serverless compute for file notification mode

Isa1
New Contributor II

I am creating a table that ingests data from aws s3 using the 'file notification mode'. With a single user cluster, it works. I would like to use Serverless compute, but I get an error about authentication. Is it possible to do this, or are there alternatives to automatically scale up compute based on demand?

1 REPLY 1

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Isa1,

Using Serverless compute with Auto Loader in file notification mode can indeed present authentication challenges. Based on the context provided, here are some insights and alternatives:

 

  1. Authentication Issues with Serverless Compute:
    • Serverless compute may face authentication errors due to the specific permissions required for setting up and accessing AWS services like S3, SNS, and SQS. These permissions need to be correctly configured to allow Serverless compute to interact with these services.
  2. Permissions and Policies:
    • Ensure that the IAM role or user associated with your Serverless compute has the necessary permissions. The required permissions include actions like sns:CreateTopic, sns:Publish, sqs:CreateQueue, sqs:ReceiveMessage, and s3:GetObject. Detailed permissions are outlined in the Auto Loader file notification mode documentation. Could you please advise how are you authenticating to access the S3 bucket?
  3. Alternatives to Serverless Compute:
    • If Serverless compute continues to present issues, consider using a single/sared user cluster or a job cluster. These clusters can be configured with the necessary instance profiles and permissions to interact with AWS services without the same authentication hurdles.
    • Another alternative is to use Databricks' auto-scaling clusters in your all-purpose cluster, which can automatically scale up based on demand. This can help manage compute resources efficiently while avoiding some of the authentication complexities associated with Serverless compute.
  4. Manual Configuration:
    • If you prefer to stick with Serverless compute, you might need to manually configure the necessary AWS resources (SNS topics, SQS queues) and ensure that the Serverless compute has the correct permissions to access these resources.

https://docs.databricks.com/en/compute/configure.html#autoscaling

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group