Databricks Community

mkEngineer · ‎12-16-2024

Hi,

I'm setting up a Delta Live Tables (DLT) pipeline for my medallion architecture. I’m interested in tracking, ingesting, and analyzing the log files in Azure Log Analytics. However, I haven’t found much information on how to configure this setup.

Specifically, I have the following questions:

Is it possible to connect Azure Log Analytics via an Azure Key Vault for secure access?
Since DLT pipelines run on job clusters instead of regular clusters (as described in earlier documentation), how should I handle this configuration? On the current job cluster used for my DLT pipeline, log file destinations are not enabled.

Additionally, does the logging process involve the _delta_log files, or is there another recommended way to ingest and configure log files for Azure Log Analytics?

Any guidance or best practices on this integration would be greatly appreciated!

Thank you!

Walter_C · ‎12-16-2024

To address your questions about setting up a Delta Live Tables (DLT) pipeline for your medallion architecture and integrating it with Azure Log Analytics, here are the detailed steps and best practices:

Connecting Azure Log Analytics via Azure Key Vault for Secure Access:
Yes, it is possible to connect Azure Log Analytics via Azure Key Vault for secure access. Azure Key Vault can securely store and manage access to secrets, such as connection strings and API keys, which can be used by your Databricks environment. You can configure Azure Key Vault to store the necessary credentials and then access these secrets from your Databricks notebooks or jobs using the Databricks Secrets API.
Handling Configuration on Job Clusters for DLT Pipelines:
Since DLT pipelines run on job clusters, you need to ensure that the job clusters have the necessary configurations to access Azure Log Analytics. Here are the steps:
- Create and Configure Azure Key Vault: Store your Azure Log Analytics workspace ID and primary key in Azure Key Vault.
- Set Up Databricks Secrets: Use the Databricks CLI or UI to create a secret scope and add the secrets from Azure Key Vault.
- Access Secrets in Your DLT Pipeline: In your DLT pipeline notebooks, use the dbutils.secrets.get function to retrieve the secrets and configure the logging.
Ingesting and Configuring Log Files for Azure Log Analytics:
The logging process for DLT pipelines involves capturing logs from the _delta_log files, which are part of the Delta Lake transaction log. However, for integration with Azure Log Analytics, you can use the following approach:
- Enable Cluster Logging: Ensure that cluster logging is enabled to capture logs and metrics.
- Use Azure Monitor: Configure Azure Monitor to collect logs from your Databricks clusters and send them to Azure Log Analytics. This can be done by setting up diagnostic settings in Azure Monitor to route logs to your Log Analytics workspace.
Best Practices:
- Secure Access: Always use Azure Key Vault to manage and access secrets securely.
- Monitor and Audit: Use Azure Monitor and Log Analytics to continuously monitor and audit your DLT pipelines.
- Data Quality and Lineage: Utilize the event log schema provided by Databricks to track data quality metrics and lineage information for your DLT pipelines

mkEngineer · ‎12-17-2024

"message": " File <command-68719476741>, line 10\n log_analytics_pkey = dbutils.secrets.get(scope=\"ScopeLogAnalyticsPKey\", key=\"LogAnalyticsPKey\")\n ^\nSyntaxError: invalid syntax\n", "error_class": "_UNCLASSIFIED_PYTHON_COMMAND_ERROR"

It seems odd that this configuration has to be handled at the command-line level. Could you guide me further on how to set up this configuration, given that it doesn’t work in the notebook? Specifically, is there a way to configure the secrets directly in the JSON settings or the DLT UI Advanced Configuration?

"message": " File <command-68719476741>, line 10\n log_analytics_pkey = dbutils.secrets.get(scope=\"ScopeLogAnalyticsPKey\", key=\"LogAnalyticsPKey\")\n ^\nSyntaxError: invalid syntax\n", "error_class": "_UNCLASSIFIED_PYTHON_COMMAND_ERROR"

For example, could I pass my two secrets (Log Analytics Workspace ID and Log Analytics Primary Key, stored in Key Vault) as key-value pairs under Advanced Configuration? How does the Scope I jsuit created com into play here? Or is that section only for secrets created in the CLI’s secret scope?

Simply put, can I use the Advanced Configuration (Key-Value pairs) to set these secrets and avoid reliance on code-based retrieval?

On another note, how can I verify that cluster logging is enabled? Besides checking the Logs and Metrics sections in the DLT Pipeline UI under Compute/Clusters, is there another way to ensure that logging and metrics are correctly captured? Those tabs a re enbaled but that is not enough for confirming enablement.

Also, I noticed in the Compute page under Advanced Options the setting:
"When a user runs a command on a cluster with Credential Passthrough enabled, that user's Azure Active Directory credentials will be automatically passed through to Spark, allowing them to access data in Azure Data Lake Storage Gen1 and Gen2 without having to manually specify their credentials."

However, I’m unable to change the destination from "None." Could this be related to Unity Catalog being enabled? If so, does Unity Catalog impose restrictions on credential passthrough or how secrets are managed with Azure Key Vault?

Lastly, when running the notebooks for DLT, I noticed that a fourth tab briefly appears at the bottom of the page (next to DLT Graph, DLT Event Log, and DLT Query History) called Pipeline Logs, but it disappears after about a second.

I suspect my Azure Monitor setup is mostly correct, but it seems like the logs are not being routed to the Log Analytics Workspace. Can you confirm if the route for logs should be explicitly set elsewhere, or if there’s an issue with the configuration itself?

Thanks again for your help!