12-30-2023 05:50 PM - edited 12-30-2023 05:52 PM
Hey Everyone,
I've built a very simple pipeline with a single DLT using auto ingest, and it works, provided I don't specify the output location. When I build the same pipeline but set UC as the output location, it fails when setting up S3 notifications, which is entirely bizarre. I've looked at the logs on the DBZ side and request logs in AWS and it looks like DBZ isn't using the instance profile I've set for some reason. Further details below, any help would be greatly appreciated!
Context
Things I've done
Things I've tried
01-01-2024 10:56 PM
Hi @Red1,
I hope this is helpful for you. Please let me know if you have any other questions or feedback. 😊
01-01-2024 11:59 PM
Hey @Kaniz ,
Thanks for the response!
I tried the above with no luck unfortunately:
- I don't have an apply_merge function in my pipeline definition, please find the pipe definition below
- I'm running DBR 14.2
- I don't think databricks connect applies here as this was all set up in the databricks UI
- Thanks for the link to that one, I read it a couple times and I've implemented all the recommendations with no luck.
DLT definition:
CREATE OR REFRESH STREAMING LIVE TABLE raw_testing
AS SELECT *
FROM cloud_files(
"s3://bucket-path",
"csv",
map(
"header", "true",
"sep", "|",
"cloudFiles.useNotifications", "true",
"inferSchema", "true"
)
);
This pipeline works as expected when using HMR as the output location but doesn't work with UC.
Any other thoughts? Is there some way i can escalate this? At this point it feels like a bug.
01-02-2024 01:28 AM - edited 01-02-2024 02:11 AM
Hi @Red1,
I hope these suggestions will help you fix your issue. If none works, you can contact the Databricks support team for further assistance.
Please let me know if you have any other questions or feedback. I’m always happy to help 😊
01-02-2024 03:06 PM
Thanks @Kaniz, UC can connect to the S3 bucket and read the data but it fails when trying to set up the bucket notifications.
I'll raise a ticket with support and post back here if I find a resolution.
10 hours ago
@Red1 Were you able resolve this issue, if yes , what was the fix ?
4 hours ago
Hey @Babu_Krishnan I was! I had to reach out to my Databricks support engineer directly and the resolution was to add "cloudfiles.awsAccessKey" and "cloudfiles.awsSecretKey" to the params as in the screenshot below (apologies, i don't know why the scrnsht is so grainy). he also mentioned using Databricks secret store for the credentials themselves.
2 hours ago
Thanks a lot @Red1. Let me try that.
But curious to know what the purpose of roleARN is. Also interested in learning how we can utilize Secret Manager to prevent passing credentials as plain text in a notebook. Thanks in advance.
18m ago
@Red1 , It worked . Thanks for the details. Used Databricks secrets to store the credentials.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.