Data Engineering

Forum Posts

Sorted by:

by SRJDB • New Contributor II

11-28-2025 7:49:38 AM

583 Views
3 replies
5 kudos

Resolved! How to restrict the values permitted in a job or task parameter?

Hi, apologies if this is a daft question - I'm relatively new to Databricks and still finding my feet!I have a notebook with a parameter set within it via a widget, like this:dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"]) my_variable = d...

Data Engineering

583 Views
3 replies
5 kudos

11-28-2025 7:49:38 AM

View Replies

Latest Reply

iyashk-DB
Databricks Employee

12-01-2025 11:17:53 AM

5 kudos

Job/task parameters are free-form strings (or JSON) that get pushed down into tasks; there’s no built‑in way in Jobs to constrain them to an enum list like A/B/C in the UI or API. You can override them at run time, but they’re not validated against t...

5 kudos

12-01-2025 11:17:53 AM

2 More Replies

by Swathik • New Contributor III

11-30-2025 1:52:18 AM

1902 Views
5 replies
1 kudos

Resolved! Best practices for the meta data driven ETL framework

I am designing a metadata‑driven ETL framework to migrate approximately 500 tables from Db2 to PostgreSQL.After reviewing multiple design patterns and blog posts, I am uncertain about the recommended approach for storing ETL metadata such as source s...

Data Engineering

1902 Views
5 replies
1 kudos

11-30-2025 1:52:18 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

12-01-2025 7:41:53 AM

1 kudos

For a migration of that scale, I’d lean toward storing metadata in database tables rather than YAML files. It’s easier to query, update, and integrate with orchestration tools, especially when you have 500 tables. YAML works fine for small projects, ...

1 kudos

12-01-2025 7:41:53 AM

4 More Replies

by Brahmareddy • Esteemed Contributor

11-12-2025 1:21:55 PM

2014 Views
4 replies
9 kudos

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

As a data engineer deeply passionate about how data and AI can come together to create real-world impact, I’m excited to share my project for the Databricks Free Edition Hackathon 2025 — Future of Movie Discovery (FMD). Built entirely on Databricks F...

Data Engineering

2014 Views
4 replies
9 kudos

11-12-2025 1:21:55 PM

View Replies

Latest Reply

AlbertaBode
New Contributor III

12-01-2025 3:22:59 AM

9 kudos

Really cool project! The mood-based movie matching and conversational memory make the whole discovery experience feel way more intuitive. It’s interesting because most people still browse platforms manually — like on streaming App — but your system s...

9 kudos

12-01-2025 3:22:59 AM

3 More Replies

by isai-ds • New Contributor

11-08-2024 11:59:27 AM

1009 Views
1 replies
0 kudos

Salesforce LakeFlow connect - Deletion Salesforce records

Hello, I am new in databricks and related to data engineering. I am running a POC to sync data between a Salesforce sandbox and Databricks using LakeFlow connect.I already make the connection and i successfully sync data between salesforce and databr...

Data Engineering

1009 Views
1 replies
0 kudos

11-08-2024 11:59:27 AM

View Replies

Latest Reply

Saritha_S
Databricks Employee

12-01-2025 2:09:47 AM

0 kudos

Hi @isai-ds Could you please refer to the document below? https://www.databricks.com/blog/introducing-salesforce-connectors-lakehouse-federation-and-lakeflow-connect https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/salesforce-faq

0 kudos

12-01-2025 2:09:47 AM

by Dom1 • New Contributor III

11-26-2025 4:04:15 AM

631 Views
3 replies
2 kudos

Pull JAR from private Maven repository (Azure Artifactory)

Hi,I currently struggle on the following task:We want to push our code to a private repository (Azure Artifactory) and then pull it from databricks when the job runs. It currently works only with wheels inside a PyPi repo in the artifactory. I found ...

Data Engineering

631 Views
3 replies
2 kudos

11-26-2025 4:04:15 AM

View Replies

Latest Reply

Prajapathy_NKR
Contributor

11-30-2025 9:45:19 PM

2 kudos

Hi @Dom1 ,One solution which i had implemented is to use API to connect to artifact and download the latest artifact to driver's storage (when you use curl to download the file, it gets downloaded in the disk of the driver), later moved it to the req...

2 kudos

11-30-2025 9:45:19 PM

2 More Replies

by Johan_Van_Noten • New Contributor III

11-25-2025 8:59:15 AM

756 Views
3 replies
2 kudos

Long-running Python http POST hangs

As one of the steps in my data engineering pipeline, I need to perform a POST request to a http (not -s) server.This all works fine, except for the situation described below: it then hangs indefinitely.Environment:Azure Databricks Runtime 13.3 LTSPyt...

Data Engineering

756 Views
3 replies
2 kudos

11-25-2025 8:59:15 AM

View Replies

Latest Reply

siva-anantha
Databricks Partner

11-30-2025 1:18:14 PM

2 kudos

Hello,IMHO, having a HTTP related task in a Spark cluster is an anti-pattern. This kind of code executes at the Driver, it will be synchronous and adds overhead. This is one of the reasons, DLT (or SDP - Spark Declarative Pipeline) does not have REST...

2 kudos

11-30-2025 1:18:14 PM

2 More Replies

by adhi_databricks • Contributor

11-28-2025 9:05:28 PM

5049 Views
2 replies
2 kudos

Resolved! How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

Hi everyone,I’m trying to set up a smooth local-development workflow for Databricks and would love to hear how others are doing it.My Current SetupI do most of my development in Cursor (VS Code-based editor) because the AI agents make coding much fas...

Data Engineering

5049 Views
2 replies
2 kudos

11-28-2025 9:05:28 PM

View Replies

Latest Reply

siva-anantha
Databricks Partner

11-30-2025 12:32:45 PM

2 kudos

@adhi_databricks: I want to add my perspective when it comes to pure local development (without Databricks connect).I wanted to setup a local development environment without connecting to Databricks workspace/cloud storage; develop PySpark code in VS...

2 kudos

11-30-2025 12:32:45 PM

1 More Replies

by mdungey • New Contributor II

11-28-2025 2:35:09 AM

671 Views
3 replies
0 kudos

Deleting Lakeflow pipelines impact on objects within.

I've seen hidden in some forums that Databricks are working on a fix so that when you delete a LDP pipeline it doesn't delete the underlying objects(streaming tables, mat views etc..). Can anyone from an official source confirm this and maybe give s...

Data Engineering

671 Views
3 replies
0 kudos

11-28-2025 2:35:09 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

11-28-2025 8:45:15 AM

0 kudos

yes, I would take that as a pinch of salt

0 kudos

11-28-2025 8:45:15 AM

2 More Replies

by mafzal669 • New Contributor

11-28-2025 7:30:39 AM

251 Views
1 replies
0 kudos

Admin user creation

Hi,I have created an azure account using my personal email id. I want to create this email id as Group Id in databricks admin console. But when I am adding a new user it says the user with this email id already exist. Could someone please help. I use...

Data Engineering

251 Views
1 replies
0 kudos

11-28-2025 7:30:39 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

11-28-2025 8:37:56 AM

0 kudos

As the User IDs and Group IDs share the same namespace in Databricks you cannot create a Group with the same email address that is already registered as a User in your Databricks account.You better rename the group.

0 kudos

11-28-2025 8:37:56 AM

by suchitpathak08 • New Contributor

11-27-2025 5:21:34 AM

714 Views
3 replies
0 kudos

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

Hi everyone,I’m running into two blocking issues while trying to run a Delta Live Tables (DLT) pipeline on Databricks (Azure). I’m hoping someone can help me understand what’s going wrong.1. Unity Catalog cannot access underlying ADLS storageEvery DL...

Data Engineering

714 Views
3 replies
0 kudos

11-27-2025 5:21:34 AM

View Replies

Latest Reply

bianca_unifeye
Databricks MVP

11-28-2025 4:14:47 AM

0 kudos

DLT pipelines always spin up job compute, and Azure is strict about SKU availability per region & per subscription. Most common causes Quota for that VM family is set to 2 vCPUsDatabricks shows: “Estimated available: 2” “QuotaExceeded” The SKU exists...

0 kudos

11-28-2025 4:14:47 AM

2 More Replies

by Suheb • Contributor

11-27-2025 10:26:29 PM

308 Views
1 replies
2 kudos

How can I efficiently archive old data in Delta tables without slowing queries?

How can I remove or move older rows from my main Delta table so that queries on recent data are faster, while still keeping access to the historical data if needed?

Data Engineering

308 Views
1 replies
2 kudos

11-27-2025 10:26:29 PM

View Replies

Latest Reply

Coffee77
Honored Contributor II

11-28-2025 12:27:25 AM

2 kudos

Hi Suheb, when using delta tables with databricks, whenever you use proper liquid clustering indexes or partitions, you should get a good performance in comparison to relational engines to deal with big data volumes.However, you can also separate tab...

2 kudos

11-28-2025 12:27:25 AM

by Suheb • Contributor

11-24-2025 11:27:03 PM

779 Views
4 replies
4 kudos

What are common pitfalls when migrating large on-premise ETL workflows to Databricks and how did you

When moving your big data pipelines from local servers to Databricks, what problems usually happen, and how did you fix them?

Data Engineering

779 Views
4 replies
4 kudos

11-24-2025 11:27:03 PM

View Replies

Latest Reply

tarunnagar
Contributor

11-27-2025 10:17:28 PM

4 kudos

Migrating large on-premise ETL workflows to Databricks often goes wrong when teams try to “lift and shift” legacy logic directly into Spark. Poor data layout, tiny files, and inefficient partitioning can quickly hurt performance, so restructuring dat...

4 kudos

11-27-2025 10:17:28 PM

3 More Replies

by pooja_bhumandla • Databricks Partner

11-27-2025 2:02:06 AM

852 Views
2 replies
1 kudos

Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–80 MB

Hi Community,I encountered the following error: Failed to store executor broadcast spark_join_relation_1622863 (size = Some(67141632)) in BlockManager with storageLevel=StorageLevel(memory, deserialized, 1 replicas)in a Structured S...

Data Engineering

852 Views
2 replies
1 kudos

11-27-2025 2:02:06 AM

View Replies

Latest Reply

Yogesh_Verma_
Contributor II

11-27-2025 10:10:38 PM

1 kudos

What Spark Does During a Broadcast Join-Spark identifies the smaller table (say 80MB).The driver collects this small table to a single JVM.The driver serializes the table into a broadcast variable.The broadcast variable is shipped to all executors.Ex...

1 kudos

11-27-2025 10:10:38 PM

1 More Replies

by mkkao924 • New Contributor II

11-26-2025 11:41:46 AM

973 Views
3 replies
1 kudos

Best practice to handle SQL table archives?

Many of our source data are setup in a way that the main table only keep small amount of data, and historical data are move to another archive table with very similar schema.My goal is have one table in Databricks, maybe with a flag to indicate if th...

Data Engineering

973 Views
3 replies
1 kudos

11-26-2025 11:41:46 AM

View Replies

Latest Reply

Coffee77
Honored Contributor II

11-27-2025 12:59:19 PM

1 kudos

I would need to dive deeper in your scenario but it sounds to me a strategy could be:1) Create a view in your SQL Server database with "current data" UNION "historical data". You can set an additional boolean field with True in first query and False ...

1 kudos

11-27-2025 12:59:19 PM

2 More Replies

by Direo • Contributor II

11-26-2025 11:55:25 PM

549 Views
4 replies
0 kudos

[DAB] registered_model aliases not being applied to Unity Catalog despite successful deploy

HiI'm experiencing an issue with Databricks Asset Bundles where model aliases defined in the bundle configuration are not being applied to Unity Catalog, even though the deployment succeeds and the Terraform state shows the aliases are set.Environmen...

Data Engineering

549 Views
4 replies
0 kudos

11-26-2025 11:55:25 PM

View Replies

Latest Reply

iyashk-DB
Databricks Employee

11-27-2025 10:52:24 AM

0 kudos

Can you try by explicitly adding: databricks model-versions get-by-alias <catalog>.<schema>.<model> staging

0 kudos

11-27-2025 10:52:24 AM

3 More Replies

Databricks Community

Forum Posts

Resolved! How to restrict the values permitted in a job or task parameter?

Resolved! Best practices for the meta data driven ETL framework

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

Salesforce LakeFlow connect - Deletion Salesforce records

Pull JAR from private Maven repository (Azure Artifactory)

Long-running Python http POST hangs

Resolved! How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

Deleting Lakeflow pipelines impact on objects within.

Admin user creation

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

How can I efficiently archive old data in Delta tables without slowing queries?

What are common pitfalls when migrating large on-premise ETL workflows to Databricks and how did you

Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–80 MB

Best practice to handle SQL table archives?

[DAB] registered_model aliases not being applied to Unity Catalog despite successful deploy

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template