Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best prac...
Explore discussions on Databricks administration, deployment strategies, and architectural best prac...
Join discussions on data engineering best practices, architectures, and optimization strategies with...
Join discussions on data governance practices, compliance, and security within the Databricks Commun...
Explore discussions on generative artificial intelligence techniques and applications within the Dat...
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithm...
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Communi...
My delta table are stored at HANA data lake file and I have ETL configured like below@DP.materialized_view(temporary=True) def source(): return spark.read.format("delta").load("/data/source") @dp.materialized_view def sink(): return spark.re...
Hi @anhnnguyen, You defined source for you CDF as temporary view and they are always fully refreshed on every pipeline run.Try defining it without this option
HelloIs SQL warehouses managed by Unity Catalog? My understanding is that since SQL Warehouse is part of the compute layer, Unity Catalog doesn't manage it as it only manages data layers.
Hi @Senga98 ,Your understanding is correct. Unity Catalog governs:Data objects (catalogs, schemas, tables, views, functions)Permissions (grants on the above)LineageGoverned storage locations & external locationsModel serving endpoints (UC Volumes / A...
Hi!We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.However, I don't understand how we are supposed to use th...
Correction. Trying this will result in this error> ABAC policies are not supported on tables defined within a pipeline. Remove the policies or contact Databricks support.So it isn't supported
How can I use external AI libraries inside my Databricks GenAI projects?
I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...
Great, these are amazing resources! I'm using them to test my IAM apps and flow.
I'm implementing DABs, Jobs, and Notebooks.For configure management, I set parameters on databricks.yml.but I can't get parameters on notebook after executed a job successfully. What I implemented ans Steps to the issue:Created "dev-catalog" on WEB U...
Hi @tak0519 I think I found the issue! Don't worry - your DABs configuration looks correct. The problem is actually about how you're verifying the results, not the configuration itself. What's happening In your last comment, you mentioned: "Manuall...
I'm aware, my workspace/subscription will be converted into a 'pay-as-you-go' model. That's okay - however I wonder why you don't provide a non-restricted plan just for learning. I'm sure there are ways to block commercial use. However, that's not my...
Hi @quakenbush ,In the past you had to create a new VNet injected workspace and migrate all workloads from the existing managed workspace to enable VNet injection. This process was necessary because there was no direct way to convert a managed worksp...
Hello Guyz,Someone Know what's is the best pratices to setup databricks connect for Pycharm and VsCode using Docker, Justfile and .env file Cordially, Seefoods
Hi @seefoods!I’ve worked with Databricks Connect and VSCode in different projects, and although your question mentions Docker, Justfile and .env, the “best practices” really depend on what you’re trying to do. Here’s what has worked best for me:1.- D...
Hi helpful experts I'm writing my first PySpark Notebook that makes use of the new `ai_parse_document` function. I am basically following the code example from here: https://learn.microsoft.com/en-gb/azure/databricks/sql/language-manual/functions/ai...
Hello @JN_Bristol,I discovered that ai_parse_document only works when the input is parsed as real Python bytes.The binaryFile format in Spark returns the content as an internal binary type (like a memoryview), and ai_parse_document can’t process that...
Hi everyone,I’ve been working on an updated VM selection decision tree for Azure Databricks, designed to help teams quickly identify the most suitable worker types based on workload behavior. I’m sharing the latest version (In this updated version I’...
Hi saicharandeepb, You can enrich your chart by adding GPU-accelerated VMs. For computationally challenging tasks that demand high performance, like those associated with deep learning, Azure Databricks supports compute resources that are accelerated...
Today, we are announcing the industry's first Generative AI Engineer learning pathway and certification to help ensure that data and AI practitioners have the resources to be successful with generative AI. At Databricks, we recognize that generative ...
This is an exciting step forward from Databricks! Looking forward to diving into the curriculum and exploring what's next in the world of data + AI! Thanks for sharing @Sujitha
Hope you’re doing well!I’m reaching out for some guidance on an issue I’ve encountered while setting up Azure Databricks Cluster Pools to reduce cluster spin-up and scale times for our jobs.Background:To optimize job execution wait times, I’ve create...
Possible reasons:1. Setting spot_bid_max_price = -1 is not accepted by Azure poolsAzure Databricks only accepts:0 → on-demand onlypositive numbers → max spot price-1 is allowed in cluster policies, but not inside pools, so validation never completes....
Hello,I have a problem with the autoscaling of a cluster. Every time the autoscaling is activated I get this error. Does anyone have any idea why this could be?"Cluster xxxxxxx was terminated during the run (cluster state message: Lost communication ...
Hello Databricks Community,The driver node was lost, which might occur as a result of network problems or malfunctioning instances, according to the error message. Here are some potential causes and remedies:Instance Instability: Consider switching t...
I just create a databricks workspace with GCP with "Use existing cloud account (Storage & compute)" option. I already add a few cluster for my task but when i try to create ETL, i always get this error notification. The file is created on the specifi...
Hi @molopocho We need to enable the feature in the workspace. If you don't see the option, then you need to reach out to the accounts team or create a ticket to databricks support team t get it enabled at the workspace level.
Hi everyone,I'm working on optimizing Databricks costs for a production-grade data pipeline (Spark + Delta Lake) on Azure. I’m looking for practical, field-tested strategies to reduce compute and storage spend without impacting performance.So far, I’...
Hello @Poorva21 , Below are the answers to your questions: Q1. What are the most impactful cost optimisations for production pipelines? I have worked with multiple Cx and based on my knowledge, below are a high-level optimisations one must have: The ...
| User | Count |
|---|---|
| 1814 | |
| 880 | |
| 705 | |
| 470 | |
| 312 |