Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best prac...
Explore discussions on Databricks administration, deployment strategies, and architectural best prac...
Join discussions on data engineering best practices, architectures, and optimization strategies with...
Join discussions on data governance practices, compliance, and security within the Databricks Commun...
Explore discussions on generative artificial intelligence techniques and applications within the Dat...
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithm...
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Communi...
We need to import large amount of Jira data into Databricks, and should import only the delta changes. What's the best approach to do so? Using the Fivetran Jira connector or develop our own Python scripts/pipeline code? Thanks.
Hi @greengil, Have you considered Lakeflow Connect? Databricks now has a native Jira connector in Lakeflow Connect that can achieve what you are looking for. It's in beta, but something you may want to consider. It ingests Jira into Delta with incr...
Hi everyone,I am currently working on a migration project from Azure Databricks to GCP Databricks, and I need some guidance from the community on best practices around registering external Delta tables into Unity Catalog.Currenlty I am doing this but...
Hi @muaaz, On GCP Databricks, the SQL pattern you are using is fine, but the recommended best practice is to back it with a Unity Catalog external location instead of pointing tables directly at arbitrary gs:// paths. In practice, that means first cr...
How can I access secrets in pipeline yaml or directly in python script file?
Hi @prakharsachan ,In Declarative Automation Bundles YAML (formerly known as Databricks Assets Bundles) you can only define secret scopes:If you want to read secrets from secret scope you can use dbutils in python script:password = dbutils.secrets.ge...
Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...
I'm experiencing an unusual issue following my return from annual leave. I'm unable to connect to any compute from a notebook (both Classic Compute and Serverless) this is despite having Can Manage permissions on the clusters.The error shown is: "Unk...
I am deploying DLT pipeline in dev environment using DABs. source code is in a python script file. In the pipeline's yml file the configuration key is set to true(with all correct indentations), yet the pipeline isnt deploying in the continuous mode....
Hi @prakharsachan ,Continuous must be set inside the pipeline resource definition, not under configuration.The configuration block in a SDP (former DLT) pipeline definition is for Spark/pipeline settings (key-value string pairs passed to the runtime)...
I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...
Hi @tsam ,I think your problem might be caused by the fact that each call "CREATE OR REPLACE TABLE ... DEEP CLONE" accumulates state on the driver even though you're not collecting data.The main culprits are:1. Spark Plan / Query Plan Caching Every S...
@Ashwin_DSA could you please provide an example .
Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...
Hi @ChristianRRL you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...
Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...
Hi, I ran into the same confusion and did some testing on this. Here's what I found: Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it. But {{tasks.par...
Hi,Can we create a file based trigger from sharepoint location for excel files from databricks. So my need is to copy the excel files from sharepoint to external volumes in databricks so can it be done using a trigger that whenever the file drops in ...
File-based triggers in Databricks are designed to work with data that already resides in cloud storage (such as ADLS, S3, or GCS). In this case, since the source system is SharePoint, expecting a native file-based trigger from Databricks is not feasi...
Hi,We are implementing external dashboard embedding in Azure Databricks and want to avoid using client secrets by leveraging **Azure Managed Identity** with **OAuth token federation** for generating the embedded report token.Following OAuth token fed...
Did you get any information whether this is on their roadmap?I came across this issue last week and the documentation doesn't have anything about this limitation.
I've created a Databricks Model Serving Endpoint which serves an MLFlow Pyfunc model. The model uses langchain and I'm using mlflow.langchain.autolog().At my company we have some production(-like) workspaces where users cannot e.g. run Notebooks and ...
Funnily enough, the problem also disappeard on my end this morning Previously, I saw a networking issue in my logs, but that also went away. Let's hope it stays that way!
Hi,I am using a medallion architecture on Azure Data Lake Storage Gen2 with Azure Databricks. Currently, I am storing data in Parquet format (not Delta tables), and I am planning to implement Unity Catalog (UC).As part of this setup, I understand tha...
I was going to follow 3rd but then it violets our medallion. And we don't have that much data to separate it physically. So going with 1st approach. But Thank you very much @karthickrs, I'll keep this in mind
I am currently working on a migration project from power BI to ai bi dashboard in databricks . Now i am using the metric views to create all the measures and DAX queries which i have in my power BI report in YAML in the metric views but the main prob...
Hey @Akshatkumar69, welcome to the community. You're not alone on this one, it is common with folks coming from Power BI. The key thing to understand is that AI/BI charts do expect a single data source, but that source can be a metric view that alrea...
| User | Count |
|---|---|
| 1837 | |
| 884 | |
| 763 | |
| 471 | |
| 312 |