cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GANAPATI_HEGDE
by New Contributor III
  • 266 Views
  • 3 replies
  • 0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this? 

GANAPATI_HEGDE_0-1762754316899.png GANAPATI_HEGDE_1-1762754398253.png
  • 266 Views
  • 3 replies
  • 0 kudos
Latest Reply
GANAPATI_HEGDE
New Contributor III
  • 0 kudos

i updated my CLI and deployed the job, still i dont see the clusters updates in  pipeline

  • 0 kudos
2 More Replies
singhanuj2803
by Contributor
  • 260 Views
  • 4 replies
  • 1 kudos

Troubleshooting Azure Databricks Cluster Pools & spot_bid_max_price Validation Error

Hope you’re doing well!I’m reaching out for some guidance on an issue I’ve encountered while setting up Azure Databricks Cluster Pools to reduce cluster spin-up and scale times for our jobs.Background:To optimize job execution wait times, I’ve create...

  • 260 Views
  • 4 replies
  • 1 kudos
Latest Reply
Poorva21
New Contributor II
  • 1 kudos

Possible reasons:1. Setting spot_bid_max_price = -1 is not accepted by Azure poolsAzure Databricks only accepts:0 → on-demand onlypositive numbers → max spot price-1 is allowed in cluster policies, but not inside pools, so validation never completes....

  • 1 kudos
3 More Replies
molopocho
by New Contributor
  • 175 Views
  • 1 replies
  • 0 kudos

Can't create a new ETL because of compute (?)

I just create a databricks workspace with GCP with "Use existing cloud account (Storage & compute)" option. I already add a few cluster for my task but when i try to create ETL, i always get this error notification. The file is created on the specifi...

molopocho_0-1764086991435.jpeg
  • 175 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @molopocho  We need to enable the feature in the workspace. If you don't see the option, then you need to reach out to the accounts team or create a ticket to databricks support team t get it enabled at the workspace level.   

  • 0 kudos
Poorva21
by New Contributor II
  • 223 Views
  • 1 replies
  • 1 kudos

Best Practices for Optimizing Databricks Costs in Production Workloads?

Hi everyone,I'm working on optimizing Databricks costs for a production-grade data pipeline (Spark + Delta Lake) on Azure. I’m looking for practical, field-tested strategies to reduce compute and storage spend without impacting performance.So far, I’...

  • 223 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Poorva21 , Below are the answers to your questions: Q1. What are the most impactful cost optimisations for production pipelines? I have worked with multiple Cx and based on my knowledge, below are a high-level optimisations one must have: The ...

  • 1 kudos
crami
by New Contributor II
  • 183 Views
  • 2 replies
  • 0 kudos

Declarative Pipeline Re-Deployment and existing managed tables exception

Hi,I am facing a issue regarding re deployment of declarative pipeline using asset bundle. On first deployment, I am able to run the pipeline successfully. On execution, pipeline, as expected create tables. However, when I try to re-deploy the pipeli...

  • 183 Views
  • 2 replies
  • 0 kudos
Latest Reply
Poorva21
New Contributor II
  • 0 kudos

Managed tables are “owned” by a DLT pipeline. Re-deploying a pipeline that references the same managed tables will fail unless you either:Drop the existing tables firstUse external tables that are not owned by DLTUse a separate development schema/pip...

  • 0 kudos
1 More Replies
__Aziz__
by New Contributor II
  • 160 Views
  • 1 replies
  • 1 kudos

Resolved! mongodb connector duplicate writes

Hi everyone,Has anyone run into this issue? I’m using the MongoDB Spark Connector on Databricks to expose data from Delta Lake to MongoDB. My workflow is:overwrite the collection (very fast),then create the indexes.Occasionally, I’m seeing duplicates...

  • 160 Views
  • 1 replies
  • 1 kudos
Latest Reply
bianca_unifeye
Contributor
  • 1 kudos

Hi Aziz,What you’re seeing is an expected behaviour when combining Spark retries with non-idempotent writes.Spark’s write path is task-based and fault-tolerant. If a task fails part-way through writing to MongoDB, Spark will retry that task.From Spar...

  • 1 kudos
fly_high_five
by New Contributor III
  • 150 Views
  • 1 replies
  • 3 kudos

Unable to retrieve catalog, schema, tables using JDBC endpoint of SQL Warehouse

Hi,I am connecting to SQL Warehouse in UC using its JDBC endpoint via DBeaver. However, it doesn't list any catalogs, schemas and tables. I checked the permission of SQL WH by logging to ADB Workspace and queried the table (attached a dummy table exa...

fly_high_five_0-1764770250626.png fly_high_five_1-1764770371607.png fly_high_five_2-1764770788643.png
  • 150 Views
  • 1 replies
  • 3 kudos
Latest Reply
mitchellg-db
Databricks Employee
  • 3 kudos

Hi there, I'm not familiar with DBeaver specifically, but I have experienced DBSQL Warehouses being much stricter when enforcing permissions than All-Purpose Clusters. Warehouses check explicitly if that identity has access to those assets, where All...

  • 3 kudos
Brahmareddy
by Esteemed Contributor
  • 489 Views
  • 4 replies
  • 9 kudos

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

As a data engineer deeply passionate about how data and AI can come together to create real-world impact, I’m excited to share my project for the Databricks Free Edition Hackathon 2025 — Future of Movie Discovery (FMD). Built entirely on Databricks F...

  • 489 Views
  • 4 replies
  • 9 kudos
Latest Reply
AlbertaBode
New Contributor II
  • 9 kudos

Really cool project! The mood-based movie matching and conversational memory make the whole discovery experience feel way more intuitive. It’s interesting because most people still browse platforms manually — like on streaming App — but your system s...

  • 9 kudos
3 More Replies
dbernstein_tp
by New Contributor III
  • 288 Views
  • 4 replies
  • 2 kudos

Resolved! Naming question about SQL server database schemas

I have an MS SQL server database that has several schemas we need to ingest data from. Call them "SCHEMA1" tables and "SCHEMA2" tables. Let's call the server S and the database D. In unity catalog I have a catalog called "staging" where the staging (...

  • 288 Views
  • 4 replies
  • 2 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 2 kudos

Thanks for the responses! @K_Anudeep suggestion makes sense in the context of our current lakehouse architecture so I think I will migrate to that.

  • 2 kudos
3 More Replies
Swathik
by New Contributor III
  • 202 Views
  • 1 replies
  • 0 kudos

Resolved! Best Practices for implementing DLT, Autoloader in Workflows

I am in the process of designing a Medallion architecture where the data sources include REST API calls, JSON files, SQL Server, and Azure Event Hubs.For the Silver and Gold layers, I plan to leverage Delta Live Tables (DLT). However, I am seeking gu...

  • 202 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The optimal approach for implementing the Bronze layer in a Medallion architecture with Delta Live Tables (DLT) involves balancing batch and streaming ingestion patterns, especially when combining DLT and Autoloader. The trigger(availableNow=True) op...

  • 0 kudos
dbernstein_tp
by New Contributor III
  • 219 Views
  • 3 replies
  • 2 kudos

Lakeflow Connect CDC error, broken links

I get this error, regarding database validation, when setting up a lakeflow connect CDC pipeline (see screenshot). The two links mentioned in the message are broken, they give me a "404 - Content Not Found" when I try to open them. 

Screenshot 2025-11-21 at 9.42.20 AM.png
  • 219 Views
  • 3 replies
  • 2 kudos
Latest Reply
dbernstein_tp
New Contributor III
  • 2 kudos

@Advika Thank you. My reason for this post was to alert the SQL server ingestion team to this bug in the interface. I will file a report about this (didn't know I could do that) and a few other issues with the feature that I've found recently.

  • 2 kudos
2 More Replies
hobrob
by New Contributor
  • 182 Views
  • 2 replies
  • 0 kudos

UDFs for working with date ranges

Hi bricklayers,Originally from a Teradata background and relatively new to Databricks, I was in need of brushing up on my Python and Github CI/CD skills so I’ve spun up a repo for a project I’m calling Terabricks.The aim is to provide a space for mak...

  • 182 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

Fantastic Initiative @hobrob.I have used Teradata for good 5+ years but pre-2014/5. So I will be closely following it and very happy to contribute to it. Thanks. 

  • 0 kudos
1 More Replies
bruce17
by New Contributor II
  • 428 Views
  • 4 replies
  • 2 kudos

Support Request: Issue Running Multiple Ingestion Gateway Concurrently

Hi, we are ingesting data using Databricks Lake flow SQL connector from two different SQL Server databases hosted on separate servers. As part of the setup:We created two separate Ingestion Gateways.We created two separate ingestion pipelines.Both pi...

  • 428 Views
  • 4 replies
  • 2 kudos
Latest Reply
HarishPrasath25
New Contributor II
  • 2 kudos

Hi @Louis_Frolio , I’ve successfully ingested one SQL database using the Lakeflow SQL connector. As part of the setup, I created an ingestion pipeline along with a gateway, and it is working as expected - when I run or re-run the pipeline, it picks u...

  • 2 kudos
3 More Replies
Sainath368
by Contributor
  • 394 Views
  • 4 replies
  • 5 kudos

Resolved! Autoloader Managed File events

Hi all,We are in the process of migrating from directory listing to managed file events in Azure Databricks. Our data is stored in an Azure Data Lake container with the following folder structure:To enable file events in Unity Catalog (UC), I created...

Sainath368_0-1763538057402.png
  • 394 Views
  • 4 replies
  • 5 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 5 kudos

Recommended approach to continue your existing pattern:Keep the External Location enabled for file events at the high-level path (/Landing).Run a separate Structured Streaming job for each table, specifying the full sub-path in the .load() function (...

  • 5 kudos
3 More Replies
Techtic_kush
by New Contributor II
  • 302 Views
  • 2 replies
  • 2 kudos

Resolved! Can’t save results to target table – out-of-memory error

Hi team, I’m processing ~5,000 EMR notes with a Databricks notebook. The job reads from `crc_lakehouse.bronze.emr_notes`, runs SciSpaCy UMLS entity extraction plus a fine-tuned BERT sentiment model per partition, and builds a DataFrame (`df_entities`...

  • 302 Views
  • 2 replies
  • 2 kudos
Latest Reply
bianca_unifeye
Contributor
  • 2 kudos

You’re right that the behaviour is weird at first glance (“5k rows on a 64 GB cluster and I blow up on write”), but your stack trace is actually very revealing: this isn’t a classic Delta write / shuffle OOM – it’s SciSpaCy/UMLS falling over when loa...

  • 2 kudos
1 More Replies
Labels