cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SRJDB
by New Contributor II
  • 583 Views
  • 3 replies
  • 5 kudos

Resolved! How to restrict the values permitted in a job or task parameter?

Hi, apologies if this is a daft question - I'm relatively new to Databricks and still finding my feet!I have a notebook with a parameter set within it via a widget, like this:dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"]) my_variable = d...

  • 583 Views
  • 3 replies
  • 5 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 5 kudos

Job/task parameters are free-form strings (or JSON) that get pushed down into tasks; there’s no built‑in way in Jobs to constrain them to an enum list like A/B/C in the UI or API. You can override them at run time, but they’re not validated against t...

  • 5 kudos
2 More Replies
Swathik
by New Contributor III
  • 1902 Views
  • 5 replies
  • 1 kudos

Resolved! Best practices for the meta data driven ETL framework

I am designing a metadata‑driven ETL framework to migrate approximately 500 tables from Db2 to PostgreSQL.After reviewing multiple design patterns and blog posts, I am uncertain about the recommended approach for storing ETL metadata such as source s...

  • 1902 Views
  • 5 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

For a migration of that scale, I’d lean toward storing metadata in database tables rather than YAML files. It’s easier to query, update, and integrate with orchestration tools, especially when you have 500 tables. YAML works fine for small projects, ...

  • 1 kudos
4 More Replies
Brahmareddy
by Esteemed Contributor
  • 2014 Views
  • 4 replies
  • 9 kudos

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

As a data engineer deeply passionate about how data and AI can come together to create real-world impact, I’m excited to share my project for the Databricks Free Edition Hackathon 2025 — Future of Movie Discovery (FMD). Built entirely on Databricks F...

  • 2014 Views
  • 4 replies
  • 9 kudos
Latest Reply
AlbertaBode
New Contributor III
  • 9 kudos

Really cool project! The mood-based movie matching and conversational memory make the whole discovery experience feel way more intuitive. It’s interesting because most people still browse platforms manually — like on streaming App — but your system s...

  • 9 kudos
3 More Replies
isai-ds
by New Contributor
  • 1009 Views
  • 1 replies
  • 0 kudos

Salesforce LakeFlow connect - Deletion Salesforce records

Hello, I am new in databricks and related to data engineering. I am running a POC to sync data between a Salesforce sandbox and Databricks using LakeFlow connect.I already make the connection and i successfully sync data between salesforce and databr...

  • 1009 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @isai-ds  Could you please refer to the document below?  https://www.databricks.com/blog/introducing-salesforce-connectors-lakehouse-federation-and-lakeflow-connect https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/salesforce-faq

  • 0 kudos
Dom1
by New Contributor III
  • 631 Views
  • 3 replies
  • 2 kudos

Pull JAR from private Maven repository (Azure Artifactory)

Hi,I currently struggle on the following task:We want to push our code to a private repository (Azure Artifactory) and then pull it from databricks when the job runs. It currently works only with wheels inside a PyPi repo in the artifactory. I found ...

  • 631 Views
  • 3 replies
  • 2 kudos
Latest Reply
Prajapathy_NKR
Contributor
  • 2 kudos

Hi @Dom1 ,One solution which i had implemented is to use API to connect to artifact and download the latest artifact to driver's storage (when you use curl to download the file, it gets downloaded in the disk of the driver), later moved it to the req...

  • 2 kudos
2 More Replies
Johan_Van_Noten
by New Contributor III
  • 756 Views
  • 3 replies
  • 2 kudos

Long-running Python http POST hangs

As one of the steps in my data engineering pipeline, I need to perform a POST request to a http (not -s) server.This all works fine, except for the situation described below: it then hangs indefinitely.Environment:Azure Databricks Runtime 13.3 LTSPyt...

  • 756 Views
  • 3 replies
  • 2 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 2 kudos

Hello,IMHO, having a HTTP related task in a Spark cluster is an anti-pattern. This kind of code executes at the Driver, it will be synchronous and adds overhead. This is one of the reasons, DLT (or SDP - Spark Declarative Pipeline) does not have REST...

  • 2 kudos
2 More Replies
adhi_databricks
by Contributor
  • 5049 Views
  • 2 replies
  • 2 kudos

Resolved! How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

Hi everyone,I’m trying to set up a smooth local-development workflow for Databricks and would love to hear how others are doing it.My Current SetupI do most of my development in Cursor (VS Code-based editor) because the AI agents make coding much fas...

  • 5049 Views
  • 2 replies
  • 2 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 2 kudos

@adhi_databricks: I want to add my perspective when it comes to pure local development (without Databricks connect).I wanted to setup a local development environment without connecting to Databricks workspace/cloud storage; develop PySpark code in VS...

  • 2 kudos
1 More Replies
mdungey
by New Contributor II
  • 671 Views
  • 3 replies
  • 0 kudos

Deleting Lakeflow pipelines impact on objects within.

I've seen hidden in some forums that Databricks are working on a fix so that when you delete a LDP pipeline it doesn't delete the underlying objects(streaming tables, mat views etc..).  Can anyone from an official source confirm this and maybe give s...

  • 671 Views
  • 3 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 0 kudos

yes, I would take that as a pinch of salt

  • 0 kudos
2 More Replies
mafzal669
by New Contributor
  • 251 Views
  • 1 replies
  • 0 kudos

Admin user creation

Hi,I have created an azure account using my personal email id. I want to create this email id as Group Id in databricks admin console. But when I am adding a new user it says the user with this email id already exist. Could someone please help. I use...

  • 251 Views
  • 1 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 0 kudos

As the User IDs and Group IDs share the same namespace in Databricks you cannot create a Group with the same email address that is already registered as a User in your Databricks account.You better rename the group.

  • 0 kudos
suchitpathak08
by New Contributor
  • 714 Views
  • 3 replies
  • 0 kudos

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

Hi everyone,I’m running into two blocking issues while trying to run a Delta Live Tables (DLT) pipeline on Databricks (Azure). I’m hoping someone can help me understand what’s going wrong.1. Unity Catalog cannot access underlying ADLS storageEvery DL...

  • 714 Views
  • 3 replies
  • 0 kudos
Latest Reply
bianca_unifeye
Databricks MVP
  • 0 kudos

DLT pipelines always spin up job compute, and Azure is strict about SKU availability per region & per subscription. Most common causes Quota for that VM family is set to 2 vCPUsDatabricks shows: “Estimated available: 2” “QuotaExceeded” The SKU exists...

  • 0 kudos
2 More Replies
Suheb
by Contributor
  • 308 Views
  • 1 replies
  • 2 kudos

How can I efficiently archive old data in Delta tables without slowing queries?

How can I remove or move older rows from my main Delta table so that queries on recent data are faster, while still keeping access to the historical data if needed?

  • 308 Views
  • 1 replies
  • 2 kudos
Latest Reply
Coffee77
Honored Contributor II
  • 2 kudos

Hi Suheb, when using delta tables with databricks, whenever you use proper liquid clustering indexes or partitions, you should get a good performance in comparison to relational engines to deal with big data volumes.However, you can also separate tab...

  • 2 kudos
Suheb
by Contributor
  • 779 Views
  • 4 replies
  • 4 kudos

What are common pitfalls when migrating large on-premise ETL workflows to Databricks and how did you

When moving your big data pipelines from local servers to Databricks, what problems usually happen, and how did you fix them?

  • 779 Views
  • 4 replies
  • 4 kudos
Latest Reply
tarunnagar
Contributor
  • 4 kudos

Migrating large on-premise ETL workflows to Databricks often goes wrong when teams try to “lift and shift” legacy logic directly into Spark. Poor data layout, tiny files, and inefficient partitioning can quickly hurt performance, so restructuring dat...

  • 4 kudos
3 More Replies
pooja_bhumandla
by Databricks Partner
  • 852 Views
  • 2 replies
  • 1 kudos

Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–80 MB

Hi Community,I encountered the following error:      Failed to store executor broadcast spark_join_relation_1622863 (size = Some(67141632)) in BlockManager              with storageLevel=StorageLevel(memory, deserialized, 1 replicas)in a Structured S...

pooja_bhumandla_0-1764236942720.png
  • 852 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

What Spark Does During a Broadcast Join-Spark identifies the smaller table (say 80MB).The driver collects this small table to a single JVM.The driver serializes the table into a broadcast variable.The broadcast variable is shipped to all executors.Ex...

  • 1 kudos
1 More Replies
mkkao924
by New Contributor II
  • 973 Views
  • 3 replies
  • 1 kudos

Best practice to handle SQL table archives?

Many of our source data are setup in a way that the main table only keep small amount of data, and historical data are move to another archive table with very similar schema.My goal is have one table in Databricks, maybe with a flag to indicate if th...

  • 973 Views
  • 3 replies
  • 1 kudos
Latest Reply
Coffee77
Honored Contributor II
  • 1 kudos

I would need to dive deeper in your scenario but it sounds to me a strategy could be:1) Create a view in your SQL Server database with "current data" UNION "historical data". You can set an additional boolean field with True in first query and False ...

  • 1 kudos
2 More Replies
Direo
by Contributor II
  • 549 Views
  • 4 replies
  • 0 kudos

[DAB] registered_model aliases not being applied to Unity Catalog despite successful deploy

HiI'm experiencing an issue with Databricks Asset Bundles where model aliases defined in the bundle configuration are not being applied to Unity Catalog, even though the deployment succeeds and the Terraform state shows the aliases are set.Environmen...

  • 549 Views
  • 4 replies
  • 0 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 0 kudos

Can you try by explicitly adding: databricks model-versions get-by-alias <catalog>.<schema>.<model> staging

  • 0 kudos
3 More Replies
Labels