cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Platform Discussions
Dive into comprehensive discussions covering various aspects of the Databricks platform. Join the conversation to deepen your understanding and maximize your usage of the Databricks platform.
cancel
Showing results for 
Search instead for 
Did you mean: 

Browse the Community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies with...

11759 Posts

Data Governance

Join discussions on data governance practices, compliance, and security within the Databricks Commun...

504 Posts

Generative AI

Explore discussions on generative artificial intelligence techniques and applications within the Dat...

325 Posts

Machine Learning

Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithm...

989 Posts

Warehousing & Analytics

Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Communi...

654 Posts

Activity in Databricks Platform Discussions

anhnnguyen
by > New Contributor II
  • 21 Views
  • 1 replies
  • 0 kudos

Materialized view always load full table instead of incremental

My delta table are stored at HANA data lake file and I have ETL configured like below@DP.materialized_view(temporary=True) def source(): return spark.read.format("delta").load("/data/source") @dp.materialized_view def sink(): return spark.re...

  • 21 Views
  • 1 replies
  • 0 kudos
Latest Reply
GaweL
Visitor
  • 0 kudos

Hi @anhnnguyen, You defined source for you CDF as temporary view and they are always fully refreshed on every pipeline run.Try defining it without this option

  • 0 kudos
Senga98
by > Contributor
  • 32 Views
  • 1 replies
  • 1 kudos

SQL Warehouse

HelloIs SQL warehouses managed by Unity Catalog? My understanding is that since SQL Warehouse is part of the compute layer, Unity Catalog doesn't manage it as it only manages data layers.

  • 32 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Senga98 ,Your understanding is correct. Unity Catalog governs:Data objects (catalogs, schemas, tables, views, functions)Permissions (grants on the above)LineageGoverned storage locations & external locationsModel serving endpoints (UC Volumes / A...

  • 1 kudos
excavator-matt
by > Contributor
  • 166 Views
  • 3 replies
  • 1 kudos

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

Hi!We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.However, I don't understand how we are supposed to use th...

Data Engineering
abac
LakeFlow
Streaming tables
tags
  • 166 Views
  • 3 replies
  • 1 kudos
Latest Reply
excavator-matt
Contributor
  • 1 kudos

Correction. Trying this will result in this error> ABAC policies are not supported on tables defined within a pipeline. Remove the policies or contact Databricks support.So it isn't supported

  • 1 kudos
2 More Replies
Suheb
by > New Contributor III
  • 12 Views
  • 0 replies
  • 0 kudos

How do I integrate third-party ML/AI libraries with Databricks GenAI workflows?

How can I use external AI libraries inside my Databricks GenAI projects?

  • 12 Views
  • 0 replies
  • 0 kudos
feliximmanuel
by > New Contributor II
  • 2695 Views
  • 2 replies
  • 2 kudos

Error: oidc: fetch .well-known: Get "https://%E2%80%93host/oidc/.well-known/oauth-authorization-serv

I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...

  • 2695 Views
  • 2 replies
  • 2 kudos
Latest Reply
guptadeepak
  • 2 kudos

Great, these are amazing resources! I'm using them to test my IAM apps and flow.

  • 2 kudos
1 More Replies
tak0519
by > New Contributor II
  • 156 Views
  • 5 replies
  • 4 kudos

How can I pass parameters from DABs to something(like notebooks)?

I'm implementing DABs, Jobs, and Notebooks.For configure management, I set parameters on databricks.yml.but I can't get parameters on notebook after executed a job successfully. What I implemented ans Steps to the issue:Created "dev-catalog" on WEB U...

  • 156 Views
  • 5 replies
  • 4 kudos
Latest Reply
Taka-Yayoi
Databricks Employee
  • 4 kudos

Hi @tak0519  I think I found the issue! Don't worry - your DABs configuration looks correct. The problem is actually about how you're verifying the results, not the configuration itself. What's happening In your last comment, you mentioned: "Manuall...

  • 4 kudos
4 More Replies
quakenbush
by > Contributor
  • 46 Views
  • 1 replies
  • 1 kudos

My trial is about to expire

I'm aware, my workspace/subscription will be converted into a 'pay-as-you-go' model. That's okay - however I wonder why you don't provide a non-restricted plan just for learning. I'm sure there are ways to block commercial use. However, that's not my...

  • 46 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @quakenbush ,In the past you had to create a new VNet injected workspace and migrate all workloads from the existing managed workspace to enable VNet injection. This process was necessary because there was no direct way to convert a managed worksp...

  • 1 kudos
seefoods
by > Valued Contributor
  • 86 Views
  • 1 replies
  • 1 kudos

setup databricks connect on VsCode and PyCharm

Hello Guyz,Someone Know what's is the best pratices to setup databricks connect for Pycharm and VsCode using Docker, Justfile and .env file Cordially, Seefoods

  • 86 Views
  • 1 replies
  • 1 kudos
Latest Reply
Gecofer
Contributor
  • 1 kudos

Hi @seefoods!I’ve worked with Databricks Connect and VSCode in different projects, and although your question mentions Docker, Justfile and .env, the “best practices” really depend on what you’re trying to do. Here’s what has worked best for me:1.- D...

  • 1 kudos
JN_Bristol
by > Contributor
  • 1836 Views
  • 6 replies
  • 2 kudos

Resolved! ai_parse_document struggling to detect pdf

Hi helpful experts I'm writing my first PySpark Notebook that makes use of the new `ai_parse_document` function.  I am basically following the code example from here: https://learn.microsoft.com/en-gb/azure/databricks/sql/language-manual/functions/ai...

  • 1836 Views
  • 6 replies
  • 2 kudos
Latest Reply
lucaperes
New Contributor III
  • 2 kudos

Hello @JN_Bristol,I discovered that ai_parse_document only works when the input is parsed as real Python bytes.The binaryFile format in Spark returns the content as an internal binary type (like a memoryview), and ai_parse_document can’t process that...

  • 2 kudos
5 More Replies
saicharandeepb
by > Contributor
  • 140 Views
  • 1 replies
  • 2 kudos

Decision Tree for Selecting the Right VM Types in Databricks – Looking for Feedback & Improvements!

Hi everyone,I’ve been working on an updated VM selection decision tree for Azure Databricks, designed to help teams quickly identify the most suitable worker types based on workload behavior. I’m sharing the latest version (In this updated version I’...

saicharandeepb_0-1763118168705.png
  • 140 Views
  • 1 replies
  • 2 kudos
Latest Reply
Sahil_Kumar
Databricks Employee
  • 2 kudos

Hi saicharandeepb, You can enrich your chart by adding GPU-accelerated VMs. For computationally challenging tasks that demand high performance, like those associated with deep learning, Azure Databricks supports compute resources that are accelerated...

  • 2 kudos
Sujitha
by Databricks Employee
  • 30632 Views
  • 19 replies
  • 34 kudos

Databricks Announces the Industry’s First Generative AI Engineer Learning Pathway and Certification

Today, we are announcing the industry's first Generative AI Engineer learning pathway and certification to help ensure that data and AI practitioners have the resources to be successful with generative AI. At Databricks, we recognize that generative ...

Screenshot 2024-01-24 at 11.32.01 PM.png
  • 30632 Views
  • 19 replies
  • 34 kudos
Latest Reply
Poorva21
New Contributor
  • 34 kudos

This is an exciting step forward from Databricks! Looking forward to diving into the curriculum and exploring what's next in the world of data + AI! Thanks for sharing @Sujitha 

  • 34 kudos
18 More Replies
singhanuj2803
by > Contributor
  • 127 Views
  • 4 replies
  • 1 kudos

Troubleshooting Azure Databricks Cluster Pools & spot_bid_max_price Validation Error

Hope you’re doing well!I’m reaching out for some guidance on an issue I’ve encountered while setting up Azure Databricks Cluster Pools to reduce cluster spin-up and scale times for our jobs.Background:To optimize job execution wait times, I’ve create...

  • 127 Views
  • 4 replies
  • 1 kudos
Latest Reply
Poorva21
New Contributor
  • 1 kudos

Possible reasons:1. Setting spot_bid_max_price = -1 is not accepted by Azure poolsAzure Databricks only accepts:0 → on-demand onlypositive numbers → max spot price-1 is allowed in cluster policies, but not inside pools, so validation never completes....

  • 1 kudos
3 More Replies
Eduard
by > New Contributor II
  • 118362 Views
  • 6 replies
  • 1 kudos

Cluster xxxxxxx was terminated during the run.

Hello,I have a problem with the autoscaling of a cluster. Every time the autoscaling is activated I get this error. Does anyone have any idea why this could be?"Cluster xxxxxxx was terminated during the run (cluster state message: Lost communication ...

  • 118362 Views
  • 6 replies
  • 1 kudos
Latest Reply
marykline
New Contributor
  • 1 kudos

Hello Databricks Community,The driver node was lost, which might occur as a result of network problems or malfunctioning instances, according to the error message. Here are some potential causes and remedies:Instance Instability: Consider switching t...

  • 1 kudos
5 More Replies
molopocho
by > New Contributor
  • 146 Views
  • 1 replies
  • 0 kudos

Can't create a new ETL because of compute (?)

I just create a databricks workspace with GCP with "Use existing cloud account (Storage & compute)" option. I already add a few cluster for my task but when i try to create ETL, i always get this error notification. The file is created on the specifi...

molopocho_0-1764086991435.jpeg
  • 146 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @molopocho  We need to enable the feature in the workspace. If you don't see the option, then you need to reach out to the accounts team or create a ticket to databricks support team t get it enabled at the workspace level.   

  • 0 kudos
Poorva21
by > New Contributor
  • 128 Views
  • 1 replies
  • 1 kudos

Best Practices for Optimizing Databricks Costs in Production Workloads?

Hi everyone,I'm working on optimizing Databricks costs for a production-grade data pipeline (Spark + Delta Lake) on Azure. I’m looking for practical, field-tested strategies to reduce compute and storage spend without impacting performance.So far, I’...

  • 128 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Poorva21 , Below are the answers to your questions: Q1. What are the most impactful cost optimisations for production pipelines? I have worked with multiple Cx and based on my knowledge, below are a high-level optimisations one must have: The ...

  • 1 kudos