cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoloader stream with EventBridge message

fhmessas
New Contributor II

Hi All,

I have a few streaming jobs running but we have been facing an issue related to messaging. We have multiple feeds within the same root rolder i.e. logs/{accountId}/CloudWatch|CloudTrail|vpcflow/yyyy-mm-dd/logs. Hence, the SQS allows to setup only one instance to the folder and we can't use wildcards.

We've tried to setup an eventbridge to filter the messages and consume from autoloader, but with this approach autoloader is not consuming the messages correctly, neither deleting them when reading.

Is there any way to setup autoloader using eventbridge between the SQS and the streamming?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Fernando Messas​ :

Yes, you can configure Autoloader to consume messages from an SQS queue using EventBridge. Here are the steps you can follow:

  1. Create an EventBridge rule to filter messages from the SQS queue based on a specific criteria (such as the feed type or account ID).
  2. Configure the rule to trigger a Lambda function when a message matching the criteria is received.
  3. Write the Lambda function to read the message from the SQS queue, extract the relevant data (such as the S3 object key), and call the Autoloader API to load the data into your streaming job.
  4. Configure the Lambda function to delete the message from the SQS queue after processing.

This approach allows you to use EventBridge to filter and route messages to different Lambda functions based on specific criteria, and then use the Autoloader API to load the data into your streaming job.

Keep in mind that you will need to handle any errors that occur during the processing of the messages, such as failed API calls or data validation errors. Additionally, you should monitor the health of your streaming job to ensure that it is processing messages correctly and not falling behind.

View solution in original post

1 REPLY 1

Anonymous
Not applicable

@Fernando Messas​ :

Yes, you can configure Autoloader to consume messages from an SQS queue using EventBridge. Here are the steps you can follow:

  1. Create an EventBridge rule to filter messages from the SQS queue based on a specific criteria (such as the feed type or account ID).
  2. Configure the rule to trigger a Lambda function when a message matching the criteria is received.
  3. Write the Lambda function to read the message from the SQS queue, extract the relevant data (such as the S3 object key), and call the Autoloader API to load the data into your streaming job.
  4. Configure the Lambda function to delete the message from the SQS queue after processing.

This approach allows you to use EventBridge to filter and route messages to different Lambda functions based on specific criteria, and then use the Autoloader API to load the data into your streaming job.

Keep in mind that you will need to handle any errors that occur during the processing of the messages, such as failed API calls or data validation errors. Additionally, you should monitor the health of your streaming job to ensure that it is processing messages correctly and not falling behind.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group