cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

venkatesh557
by New Contributor
  • 399 Views
  • 1 replies
  • 0 kudos

Resolved! Is there a supported method to register a custom PySpark DataSource so that it becomes visible in th

Built a custom connector using the PySpark DataSource API (DataSource V2). The connector works programmatically, but it does not appear in the Databricks Ingestion UI (Add Data → Connectors) like the Salesforce connector.Is there a supported method t...

  • 399 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @venkatesh557 ,Unfortunately, the answer is no - there isn’t a supported way for you to “register” an arbitrary PySpark DataSource V2 so that it appears as a tile in the Databricks Add data → Connectors (Ingestion) UI right now

  • 0 kudos
tak0519
by New Contributor III
  • 1428 Views
  • 6 replies
  • 6 kudos

Resolved! How can I pass parameters from DABs to something(like notebooks)?

I'm implementing DABs, Jobs, and Notebooks.For configure management, I set parameters on databricks.yml.but I can't get parameters on notebook after executed a job successfully. What I implemented ans Steps to the issue:Created "dev-catalog" on WEB U...

  • 1428 Views
  • 6 replies
  • 6 kudos
Latest Reply
Taka-Yayoi
Databricks Employee
  • 6 kudos

Hi @tak0519  I think I found the issue! Don't worry - your DABs configuration looks correct. The problem is actually about how you're verifying the results, not the configuration itself. What's happening In your last comment, you mentioned: "Manuall...

  • 6 kudos
5 More Replies
anhnnguyen
by New Contributor III
  • 672 Views
  • 6 replies
  • 2 kudos

Materialized view always load full table instead of incremental

My delta table are stored at HANA data lake file and I have ETL configured like below@DP.materialized_view(temporary=True) def source(): return spark.read.format("delta").load("/data/source") @dp.materialized_view def sink(): return spark.re...

  • 672 Views
  • 6 replies
  • 2 kudos
Latest Reply
anhnnguyen
New Contributor III
  • 2 kudos

1 more note that I'm not using Unity Catalog here, not sure if it's relevant

  • 2 kudos
5 More Replies
RJTECHY210
by Databricks Partner
  • 812 Views
  • 3 replies
  • 1 kudos

Resolved! Azure Databricks Streamlit Application - Doubts

Hi Databricks community, I am currently tasked with creating a stream lit application with the help of data bricks application feature, I have currently created a lake base instance to sync the delta table located at the unity catalog and I have also...

  • 812 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @RJTECHY210 ,Yes, it's possible. You can use python sdk to achieve what you want. Here's a sample code for a reference:from databricks.sdk import WorkspaceClient from databricks.sdk.service.database import DatabaseInstance # Initialize the Worksp...

  • 1 kudos
2 More Replies
GANAPATI_HEGDE
by New Contributor III
  • 531 Views
  • 3 replies
  • 0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this? 

GANAPATI_HEGDE_0-1762754316899.png GANAPATI_HEGDE_1-1762754398253.png
  • 531 Views
  • 3 replies
  • 0 kudos
Latest Reply
GANAPATI_HEGDE
New Contributor III
  • 0 kudos

i updated my CLI and deployed the job, still i dont see the clusters updates in  pipeline

  • 0 kudos
2 More Replies
hgm251
by New Contributor II
  • 1450 Views
  • 3 replies
  • 3 kudos

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Hello!This seems so sudden that we cannot create online tables anymore? Is there a workaround to being able to create online tables temporarily as we need more time to move to synced tables? #online_tables 

  • 1450 Views
  • 3 replies
  • 3 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 3 kudos

Yes, the Databricks online tables (legacy) are being deprecated, and after January 15, 2026, you will no longer be able to access or create them.https://docs.databricks.com/aws/en/machine-learning/feature-store/migrate-from-online-tablesHere are few ...

  • 3 kudos
2 More Replies
pooja_bhumandla
by Databricks Partner
  • 795 Views
  • 3 replies
  • 1 kudos

Best Practice for Updating Data Skipping Statistics for Additional Columns

Hi Community,I have a scenario where I’ve already calculated delta statistics for the first 32 columns after enabling the dataskipping property. Now, I need to include 10 more frequently used columns that were not part of the original 32.Goal:I want ...

  • 795 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @pooja_bhumandla ,Updating any of two below options does not automatically recompute statistics for existing data. Rather, it impacts the behavior of future statistics collection when adding or updating data in the table.- delta.dataSkippingNumInd...

  • 1 kudos
2 More Replies
absan
by Contributor
  • 704 Views
  • 4 replies
  • 6 kudos

Resolved! How integrate unique PK expectation into LDP pipeline graph

Hi everyone,I'm working on a LDP and need help ensuring a downstream table only runs if a primary key unique validation check passes. In something like dbt this is very easy to configure but with LDP it seems to require creating a separate view. Addi...

  • 704 Views
  • 4 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 6 kudos

I know your solution is quite popular (just I don't get SELECT MAX(load_date) ). Another one is to use AUTO CDC even if you don't have CDC, as there is KEY option. If MAX(load_date) means that the last snapshot is most essential for you, please check...

  • 6 kudos
3 More Replies
hidden
by New Contributor II
  • 1077 Views
  • 3 replies
  • 0 kudos

Resolved! replicate the behaviour of DLT create auto cdc flow

I want to custom write the behaviour of DLT create auto cdc flow . how can we do it  

  • 1077 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

And you need to handle dozens of exceptions, such as late-arriving data, duplicate data, data in the wrong order, etc.

  • 0 kudos
2 More Replies
ismaelhenzel
by Contributor III
  • 1105 Views
  • 5 replies
  • 5 kudos

Resolved! delta live tables - collaborative development

I would like to know the best practice for collaborating on a Delta Live Tables pipeline. I was thinking that each developer should have their own DLT pipeline in the development workspace. Currently, each domain has its development catalog, like sal...

  • 1105 Views
  • 5 replies
  • 5 kudos
Latest Reply
Poorva21
Contributor II
  • 5 kudos

Yes—each developer should have their own DLT pipeline and their own schema. It’s the correct paradigm.It keeps DLT ownership clean and prevents pipeline conflicts.Dev naming doesn’t need to be pretty; QA/Prod are where structure matters.

  • 5 kudos
4 More Replies
excavator-matt
by Contributor III
  • 698 Views
  • 3 replies
  • 1 kudos

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

Hi!We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.However, I don't understand how we are supposed to use th...

Data Engineering
abac
LakeFlow
Streaming tables
tags
  • 698 Views
  • 3 replies
  • 1 kudos
Latest Reply
excavator-matt
Contributor III
  • 1 kudos

Correction. Trying this will result in this error> ABAC policies are not supported on tables defined within a pipeline. Remove the policies or contact Databricks support.So it isn't supported

  • 1 kudos
2 More Replies
feliximmanuel
by New Contributor II
  • 3044 Views
  • 2 replies
  • 2 kudos

Error: oidc: fetch .well-known: Get "https://%E2%80%93host/oidc/.well-known/oauth-authorization-serv

I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...

  • 3044 Views
  • 2 replies
  • 2 kudos
Latest Reply
guptadeepak
New Contributor II
  • 2 kudos

Great, these are amazing resources! I'm using them to test my IAM apps and flow.

  • 2 kudos
1 More Replies
saicharandeepb
by Contributor
  • 532 Views
  • 1 replies
  • 2 kudos

Decision Tree for Selecting the Right VM Types in Databricks – Looking for Feedback & Improvements!

Hi everyone,I’ve been working on an updated VM selection decision tree for Azure Databricks, designed to help teams quickly identify the most suitable worker types based on workload behavior. I’m sharing the latest version (In this updated version I’...

saicharandeepb_0-1763118168705.png
  • 532 Views
  • 1 replies
  • 2 kudos
Latest Reply
Sahil_Kumar
Databricks Employee
  • 2 kudos

Hi saicharandeepb, You can enrich your chart by adding GPU-accelerated VMs. For computationally challenging tasks that demand high performance, like those associated with deep learning, Azure Databricks supports compute resources that are accelerated...

  • 2 kudos
singhanuj2803
by Contributor
  • 849 Views
  • 4 replies
  • 1 kudos

Troubleshooting Azure Databricks Cluster Pools & spot_bid_max_price Validation Error

Hope you’re doing well!I’m reaching out for some guidance on an issue I’ve encountered while setting up Azure Databricks Cluster Pools to reduce cluster spin-up and scale times for our jobs.Background:To optimize job execution wait times, I’ve create...

  • 849 Views
  • 4 replies
  • 1 kudos
Latest Reply
Poorva21
Contributor II
  • 1 kudos

Possible reasons:1. Setting spot_bid_max_price = -1 is not accepted by Azure poolsAzure Databricks only accepts:0 → on-demand onlypositive numbers → max spot price-1 is allowed in cluster policies, but not inside pools, so validation never completes....

  • 1 kudos
3 More Replies
molopocho
by Databricks Partner
  • 313 Views
  • 1 replies
  • 0 kudos

Can't create a new ETL because of compute (?)

I just create a databricks workspace with GCP with "Use existing cloud account (Storage & compute)" option. I already add a few cluster for my task but when i try to create ETL, i always get this error notification. The file is created on the specifi...

molopocho_0-1764086991435.jpeg
  • 313 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @molopocho  We need to enable the feature in the workspace. If you don't see the option, then you need to reach out to the accounts team or create a ticket to databricks support team t get it enabled at the workspace level.   

  • 0 kudos
Labels