cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hnnhhnnh
by New Contributor II
  • 53 Views
  • 1 replies
  • 0 kudos

Title: How to handle type widening (int→bigint) in DLT streaming tables without dropping the table

SetupBronze source table (external to DLT, CDF & type widening enabled):# Source table properties:# delta.enableChangeDataFeed: "true"# delta.enableDeletionVectors: "true"# delta.enableTypeWidening: "true"# delta.minReaderVersion: "3"# delta.minWrite...

  • 53 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
New Contributor
  • 0 kudos

Hi @hnnhhnnh DLT streaming tables that use apply changes do not support widening the data type of key columns such as changing an integer to a bigint after the table is created. Even though Delta and Unity Catalog support type widening in general, DL...

  • 0 kudos
JothyGanesan
by New Contributor III
  • 159 Views
  • 2 replies
  • 4 kudos

Resolved! Vacuum on DLT

We are currently using DLT tables in our target tables. The tables are getting loaded in continuous job pipelines.The liquid cluster is enabled in the tables. Will Vacuum work on these tables when it is getting loaded in continuous mode? How to run t...

  • 159 Views
  • 2 replies
  • 4 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 4 kudos

VACUUM works fine on DLT tables running in continuous mode. DLT does automatic maintenance (OPTIMIZE + VACUUM) roughly every 24 hours if the pipeline has a maintenance cluster configured. Q: The liquid cluster is enabled in the tables. Will Vacuum wo...

  • 4 kudos
1 More Replies
ismaelhenzel
by Contributor III
  • 161 Views
  • 1 replies
  • 0 kudos

Declarative Pipelines - Dynamic Overwrite

Regarding the limitations of declarative pipelines—specifically the inability to use replaceWhere—I discovered through testing that materialized views actually support dynamic overwrites. This handles several scenarios where replaceWhere would typica...

  • 161 Views
  • 1 replies
  • 0 kudos
Latest Reply
omsingh
New Contributor III
  • 0 kudos

This is a really interesting find, and honestly not something most people expect from materialized views.Under the hood, MVs in Databricks declarative pipelines are still Delta tables. So when you set partitionOverwriteMode=dynamic and partition by a...

  • 0 kudos
jpassaro
by New Contributor
  • 196 Views
  • 1 replies
  • 1 kudos

does databricks respect parallel vacuum setting?

I am trying to run VACUUM on a delta table that i know has millions of obselete files.out of the box, VACUUM runs the deletes in sequence on the driver. that is bad news for me!According to OSS delta docs, the setting spark.databricks.delta.vacuum.pa...

  • 196 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @jpassaro ,  Thanks for laying out the context and the links. Let me clarify what’s actually happening here and how I’d recommend moving forward. Short answer No. On Databricks Runtime, the spark.databricks.delta.vacuum.parallelDelete.enabl...

  • 1 kudos
GANAPATI_HEGDE
by New Contributor III
  • 294 Views
  • 3 replies
  • 0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this? 

GANAPATI_HEGDE_0-1762754316899.png GANAPATI_HEGDE_1-1762754398253.png
  • 294 Views
  • 3 replies
  • 0 kudos
Latest Reply
GANAPATI_HEGDE
New Contributor III
  • 0 kudos

i updated my CLI and deployed the job, still i dont see the clusters updates in  pipeline

  • 0 kudos
2 More Replies
singhanuj2803
by Contributor
  • 314 Views
  • 4 replies
  • 1 kudos

Troubleshooting Azure Databricks Cluster Pools & spot_bid_max_price Validation Error

Hope you’re doing well!I’m reaching out for some guidance on an issue I’ve encountered while setting up Azure Databricks Cluster Pools to reduce cluster spin-up and scale times for our jobs.Background:To optimize job execution wait times, I’ve create...

  • 314 Views
  • 4 replies
  • 1 kudos
Latest Reply
Poorva21
New Contributor II
  • 1 kudos

Possible reasons:1. Setting spot_bid_max_price = -1 is not accepted by Azure poolsAzure Databricks only accepts:0 → on-demand onlypositive numbers → max spot price-1 is allowed in cluster policies, but not inside pools, so validation never completes....

  • 1 kudos
3 More Replies
molopocho
by New Contributor
  • 216 Views
  • 1 replies
  • 0 kudos

Can't create a new ETL because of compute (?)

I just create a databricks workspace with GCP with "Use existing cloud account (Storage & compute)" option. I already add a few cluster for my task but when i try to create ETL, i always get this error notification. The file is created on the specifi...

molopocho_0-1764086991435.jpeg
  • 216 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @molopocho  We need to enable the feature in the workspace. If you don't see the option, then you need to reach out to the accounts team or create a ticket to databricks support team t get it enabled at the workspace level.   

  • 0 kudos
Poorva21
by New Contributor II
  • 361 Views
  • 1 replies
  • 1 kudos

Best Practices for Optimizing Databricks Costs in Production Workloads?

Hi everyone,I'm working on optimizing Databricks costs for a production-grade data pipeline (Spark + Delta Lake) on Azure. I’m looking for practical, field-tested strategies to reduce compute and storage spend without impacting performance.So far, I’...

  • 361 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Poorva21 , Below are the answers to your questions: Q1. What are the most impactful cost optimisations for production pipelines? I have worked with multiple Cx and based on my knowledge, below are a high-level optimisations one must have: The ...

  • 1 kudos
crami
by New Contributor II
  • 230 Views
  • 2 replies
  • 0 kudos

Declarative Pipeline Re-Deployment and existing managed tables exception

Hi,I am facing a issue regarding re deployment of declarative pipeline using asset bundle. On first deployment, I am able to run the pipeline successfully. On execution, pipeline, as expected create tables. However, when I try to re-deploy the pipeli...

  • 230 Views
  • 2 replies
  • 0 kudos
Latest Reply
Poorva21
New Contributor II
  • 0 kudos

Managed tables are “owned” by a DLT pipeline. Re-deploying a pipeline that references the same managed tables will fail unless you either:Drop the existing tables firstUse external tables that are not owned by DLTUse a separate development schema/pip...

  • 0 kudos
1 More Replies
__Aziz__
by New Contributor II
  • 223 Views
  • 1 replies
  • 1 kudos

Resolved! mongodb connector duplicate writes

Hi everyone,Has anyone run into this issue? I’m using the MongoDB Spark Connector on Databricks to expose data from Delta Lake to MongoDB. My workflow is:overwrite the collection (very fast),then create the indexes.Occasionally, I’m seeing duplicates...

  • 223 Views
  • 1 replies
  • 1 kudos
Latest Reply
bianca_unifeye
Contributor
  • 1 kudos

Hi Aziz,What you’re seeing is an expected behaviour when combining Spark retries with non-idempotent writes.Spark’s write path is task-based and fault-tolerant. If a task fails part-way through writing to MongoDB, Spark will retry that task.From Spar...

  • 1 kudos
fly_high_five
by New Contributor III
  • 187 Views
  • 1 replies
  • 3 kudos

Unable to retrieve catalog, schema, tables using JDBC endpoint of SQL Warehouse

Hi,I am connecting to SQL Warehouse in UC using its JDBC endpoint via DBeaver. However, it doesn't list any catalogs, schemas and tables. I checked the permission of SQL WH by logging to ADB Workspace and queried the table (attached a dummy table exa...

fly_high_five_0-1764770250626.png fly_high_five_1-1764770371607.png fly_high_five_2-1764770788643.png
  • 187 Views
  • 1 replies
  • 3 kudos
Latest Reply
mitchellg-db
Databricks Employee
  • 3 kudos

Hi there, I'm not familiar with DBeaver specifically, but I have experienced DBSQL Warehouses being much stricter when enforcing permissions than All-Purpose Clusters. Warehouses check explicitly if that identity has access to those assets, where All...

  • 3 kudos
Brahmareddy
by Esteemed Contributor
  • 578 Views
  • 4 replies
  • 9 kudos

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

As a data engineer deeply passionate about how data and AI can come together to create real-world impact, I’m excited to share my project for the Databricks Free Edition Hackathon 2025 — Future of Movie Discovery (FMD). Built entirely on Databricks F...

  • 578 Views
  • 4 replies
  • 9 kudos
Latest Reply
AlbertaBode
New Contributor III
  • 9 kudos

Really cool project! The mood-based movie matching and conversational memory make the whole discovery experience feel way more intuitive. It’s interesting because most people still browse platforms manually — like on streaming App — but your system s...

  • 9 kudos
3 More Replies
dbernstein_tp
by New Contributor III
  • 354 Views
  • 4 replies
  • 2 kudos

Resolved! Naming question about SQL server database schemas

I have an MS SQL server database that has several schemas we need to ingest data from. Call them "SCHEMA1" tables and "SCHEMA2" tables. Let's call the server S and the database D. In unity catalog I have a catalog called "staging" where the staging (...

  • 354 Views
  • 4 replies
  • 2 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 2 kudos

Thanks for the responses! @K_Anudeep suggestion makes sense in the context of our current lakehouse architecture so I think I will migrate to that.

  • 2 kudos
3 More Replies
Swathik
by New Contributor III
  • 272 Views
  • 1 replies
  • 0 kudos

Resolved! Best Practices for implementing DLT, Autoloader in Workflows

I am in the process of designing a Medallion architecture where the data sources include REST API calls, JSON files, SQL Server, and Azure Event Hubs.For the Silver and Gold layers, I plan to leverage Delta Live Tables (DLT). However, I am seeking gu...

  • 272 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The optimal approach for implementing the Bronze layer in a Medallion architecture with Delta Live Tables (DLT) involves balancing batch and streaming ingestion patterns, especially when combining DLT and Autoloader. The trigger(availableNow=True) op...

  • 0 kudos
dbernstein_tp
by New Contributor III
  • 269 Views
  • 3 replies
  • 2 kudos

Lakeflow Connect CDC error, broken links

I get this error, regarding database validation, when setting up a lakeflow connect CDC pipeline (see screenshot). The two links mentioned in the message are broken, they give me a "404 - Content Not Found" when I try to open them. 

Screenshot 2025-11-21 at 9.42.20 AM.png
  • 269 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 2 kudos

@Advika Thank you. My reason for this post was to alert the SQL server ingestion team to this bug in the interface. I will file a report about this (didn't know I could do that) and a few other issues with the feature that I've found recently.

  • 2 kudos
2 More Replies
Labels