Serverless compute for file notification mode
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2024 02:31 AM
I am creating a table that ingests data from aws s3 using the 'file notification mode'. With a single user cluster, it works. I would like to use Serverless compute, but I get an error about authentication. Is it possible to do this, or are there alternatives to automatically scale up compute based on demand?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2024 05:11 AM
Hi @Isa1,
Using Serverless compute with Auto Loader in file notification mode can indeed present authentication challenges. Based on the context provided, here are some insights and alternatives:
- Authentication Issues with Serverless Compute:
- Serverless compute may face authentication errors due to the specific permissions required for setting up and accessing AWS services like S3, SNS, and SQS. These permissions need to be correctly configured to allow Serverless compute to interact with these services.
- Permissions and Policies:
- Ensure that the IAM role or user associated with your Serverless compute has the necessary permissions. The required permissions include actions like sns:CreateTopic, sns:Publish, sqs:CreateQueue, sqs:ReceiveMessage, and s3:GetObject. Detailed permissions are outlined in the Auto Loader file notification mode documentation. Could you please advise how are you authenticating to access the S3 bucket?
- Alternatives to Serverless Compute:
- If Serverless compute continues to present issues, consider using a single/sared user cluster or a job cluster. These clusters can be configured with the necessary instance profiles and permissions to interact with AWS services without the same authentication hurdles.
- Another alternative is to use Databricks' auto-scaling clusters in your all-purpose cluster, which can automatically scale up based on demand. This can help manage compute resources efficiently while avoiding some of the authentication complexities associated with Serverless compute.
- Manual Configuration:
- If you prefer to stick with Serverless compute, you might need to manually configure the necessary AWS resources (SNS topics, SQS queues) and ensure that the Serverless compute has the correct permissions to access these resources.
https://docs.databricks.com/en/compute/configure.html#autoscaling