12-30-2023 05:50 PM - edited 12-30-2023 05:52 PM
Hey Everyone,
I've built a very simple pipeline with a single DLT using auto ingest, and it works, provided I don't specify the output location. When I build the same pipeline but set UC as the output location, it fails when setting up S3 notifications, which is entirely bizarre. I've looked at the logs on the DBZ side and request logs in AWS and it looks like DBZ isn't using the instance profile I've set for some reason. Further details below, any help would be greatly appreciated!
Context
Things I've done
Things I've tried
01-01-2024 11:59 PM
Hey @Retired_mod ,
Thanks for the response!
I tried the above with no luck unfortunately:
- I don't have an apply_merge function in my pipeline definition, please find the pipe definition below
- I'm running DBR 14.2
- I don't think databricks connect applies here as this was all set up in the databricks UI
- Thanks for the link to that one, I read it a couple times and I've implemented all the recommendations with no luck.
DLT definition:
CREATE OR REFRESH STREAMING LIVE TABLE raw_testing
AS SELECT *
FROM cloud_files(
"s3://bucket-path",
"csv",
map(
"header", "true",
"sep", "|",
"cloudFiles.useNotifications", "true",
"inferSchema", "true"
)
);
This pipeline works as expected when using HMR as the output location but doesn't work with UC.
Any other thoughts? Is there some way i can escalate this? At this point it feels like a bug.
01-02-2024 03:06 PM
Thanks @Retired_mod, UC can connect to the S3 bucket and read the data but it fails when trying to set up the bucket notifications.
I'll raise a ticket with support and post back here if I find a resolution.
05-02-2024 10:33 AM
@Red1 Were you able resolve this issue, if yes , what was the fix ?
05-02-2024 04:08 PM
Hey @Babu_Krishnan I was! I had to reach out to my Databricks support engineer directly and the resolution was to add "cloudfiles.awsAccessKey" and "cloudfiles.awsSecretKey" to the params as in the screenshot below (apologies, i don't know why the scrnsht is so grainy). he also mentioned using Databricks secret store for the credentials themselves.
05-02-2024 06:13 PM
Thanks a lot @Red1. Let me try that.
But curious to know what the purpose of roleARN is. Also interested in learning how we can utilize Secret Manager to prevent passing credentials as plain text in a notebook. Thanks in advance.
05-02-2024 08:05 PM
@Red1 , It worked . Thanks for the details. Used Databricks secrets to store the credentials.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group