cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Johan_Van_Noten
by New Contributor III
  • 117 Views
  • 3 replies
  • 2 kudos

Long-running Python http POST hangs

As one of the steps in my data engineering pipeline, I need to perform a POST request to a http (not -s) server.This all works fine, except for the situation described below: it then hangs indefinitely.Environment:Azure Databricks Runtime 13.3 LTSPyt...

  • 117 Views
  • 3 replies
  • 2 kudos
Latest Reply
siva-anantha
New Contributor III
  • 2 kudos

Hello,IMHO, having a HTTP related task in a Spark cluster is an anti-pattern. This kind of code executes at the Driver, it will be synchronous and adds overhead. This is one of the reasons, DLT (or SDP - Spark Declarative Pipeline) does not have REST...

  • 2 kudos
2 More Replies
adhi_databricks
by Contributor
  • 264 Views
  • 2 replies
  • 0 kudos

Resolved! How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

Hi everyone,I’m trying to set up a smooth local-development workflow for Databricks and would love to hear how others are doing it.My Current SetupI do most of my development in Cursor (VS Code-based editor) because the AI agents make coding much fas...

  • 264 Views
  • 2 replies
  • 0 kudos
Latest Reply
siva-anantha
New Contributor III
  • 0 kudos

@adhi_databricks: I want to add my perspective when it comes to pure local development (without Databricks connect).I wanted to setup a local development environment without connecting to Databricks workspace/cloud storage; develop PySpark code in VS...

  • 0 kudos
1 More Replies
a_user12
by New Contributor III
  • 31 Views
  • 2 replies
  • 1 kudos

Declarative Pipelines: set Merge Schema to False

Dear Team!I want to prevent at a certain table that the schema is automatically updated. With plain strucutred streaming I can do the following:silver_df.writeStream \ .format("delta") \ .option("mergeSchema", "false") \ .option("checkpoi...

  • 31 Views
  • 2 replies
  • 1 kudos
Latest Reply
a_user12
New Contributor III
  • 1 kudos

@szymon_dybczak  - thank you for your response I try:  @dlt.table( name="deserialized", comment="Raw messages from Kafka topic as JSON", table_properties={ "pipelines.autoOptimize.managed": "true", "pipelines.autoCompact.m...

  • 1 kudos
1 More Replies
Swathik
by New Contributor II
  • 28 Views
  • 1 replies
  • 0 kudos

Best practices for the meta data driven ETL framework

I am designing a metadata‑driven ETL framework to migrate approximately 500 tables from Db2 to PostgreSQL.After reviewing multiple design patterns and blog posts, I am uncertain about the recommended approach for storing ETL metadata such as source s...

  • 28 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Swathik ,If we're talking about where to store ETL metadata, in my opinion it's mostly a matter of preference. In my case, I prefer storing my config in YAML files, but I’ve also worked on projects where the config was stored in Delta tables.For ...

  • 0 kudos
panshi1225
by Visitor
  • 21 Views
  • 0 replies
  • 0 kudos

exam got suspended in between

@DataBricks please help with my databricks engineer associate exam that got suspended on 30th Nov '25 in between , while i was giving exam following the protocols . I was more than 50% done with my test .Please reschedule my exam . I was following al...

  • 21 Views
  • 0 replies
  • 0 kudos
DatabricksUser5
by New Contributor
  • 64 Views
  • 1 replies
  • 0 kudos

Reset committed offset of spark streaming to capture missed data

I have a very straightforward setup between Azure Eventhub and DLT using the kafka endpoint through spark streaming.There were network issues and the stream didn't pick up some event, but still progressed (and committed) the offset for some reasonAs ...

Data Engineering
dlt spark eventhub kafka azure
  • 64 Views
  • 1 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hello @DatabricksUser5 , You can’t override committed offsets in-place for a running DLT Kafka/Event Hubs stream. If a pipeline already has a checkpoint created, startingOffsets is ignored. To replay data, you must reset the streaming checkpoints or ...

  • 0 kudos
excavator-matt
by Contributor
  • 75 Views
  • 2 replies
  • 0 kudos

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

Hi!We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.However, I don't understand how we are supposed to use th...

Data Engineering
abac
LakeFlow
Streaming tables
tags
  • 75 Views
  • 2 replies
  • 0 kudos
Latest Reply
excavator-matt
Contributor
  • 0 kudos

One way to get version control might be to use the Terraform resource entity_tag_assignment. I am not sure if it supports governed_tags, but I'll experiment in the coming weeks.This separates the version control on where the tags are defined and wher...

  • 0 kudos
1 More Replies
vr
by Contributor III
  • 248 Views
  • 10 replies
  • 2 kudos

remote_query() is not working

I am trying to experiment with remote_query() function according to the documentation. The feature is in public preview, so I assume it should be available to everyone now.select * from remote_query( 'my_connection', database => 'mydb', dbtable...

  • 248 Views
  • 10 replies
  • 2 kudos
Latest Reply
vr
Contributor III
  • 2 kudos

I have the same error if I query SELECT 1 from remote_query(). From documentation:> To use the remote_query function, you first need to create a Unity Catalog connectionSo, not sure why it rebels against creators

  • 2 kudos
9 More Replies
mdungey
by New Contributor II
  • 71 Views
  • 3 replies
  • 0 kudos

Deleting Lakeflow pipelines impact on objects within.

I've seen hidden in some forums that Databricks are working on a fix so that when you delete a LDP pipeline it doesn't delete the underlying objects(streaming tables, mat views etc..).  Can anyone from an official source confirm this and maybe give s...

  • 71 Views
  • 3 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

yes, I would take that as a pinch of salt

  • 0 kudos
2 More Replies
SRJDB
by New Contributor
  • 53 Views
  • 1 replies
  • 0 kudos

How to restrict the values permitted in a job or task parameter?

Hi, apologies if this is a daft question - I'm relatively new to Databricks and still finding my feet!I have a notebook with a parameter set within it via a widget, like this:dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"]) my_variable = d...

  • 53 Views
  • 1 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

Databricks Job/Task parameter interface does not provide a built-in UI feature to restrict the possible values entered by user. Yoou can add a runtime validation code inside the notebook to allow/fail based on the values entered in.

  • 0 kudos
mafzal669
by New Contributor
  • 47 Views
  • 1 replies
  • 0 kudos

Admin user creation

Hi,I have created an azure account using my personal email id. I want to create this email id as Group Id in databricks admin console. But when I am adding a new user it says the user with this email id already exist. Could someone please help. I use...

  • 47 Views
  • 1 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

As the User IDs and Group IDs share the same namespace in Databricks you cannot create a Group with the same email address that is already registered as a User in your Databricks account.You better rename the group.

  • 0 kudos
suchitpathak08
by New Contributor
  • 68 Views
  • 3 replies
  • 0 kudos

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

Hi everyone,I’m running into two blocking issues while trying to run a Delta Live Tables (DLT) pipeline on Databricks (Azure). I’m hoping someone can help me understand what’s going wrong.1. Unity Catalog cannot access underlying ADLS storageEvery DL...

  • 68 Views
  • 3 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor III
  • 0 kudos

DLT pipelines always spin up job compute, and Azure is strict about SKU availability per region & per subscription. Most common causes Quota for that VM family is set to 2 vCPUsDatabricks shows: “Estimated available: 2” “QuotaExceeded” The SKU exists...

  • 0 kudos
2 More Replies
Mathew-Vesely
by New Contributor
  • 67 Views
  • 2 replies
  • 0 kudos

Archive of legacy system into Databricks with structure and semi-structured data

We are currently exploring using Data Bricks to store and archive data from a legacy syste. The governance features of Unity Catalogue will give us the required capabilities to ensure we meet our legal, statutory and policy requirements for data rete...

  • 67 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

Classic 360 Custoemr View case and Databricks is certainly the right platform to do so.Strcutred Data - Stores in the Delta TablesEmail and PDFs - stored in Volumes, however, metadata as path to the volumes stored in delta table against customer-idIn...

  • 0 kudos
1 More Replies
Suheb
by New Contributor III
  • 55 Views
  • 1 replies
  • 1 kudos

How can I efficiently archive old data in Delta tables without slowing queries?

How can I remove or move older rows from my main Delta table so that queries on recent data are faster, while still keeping access to the historical data if needed?

  • 55 Views
  • 1 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

Hi Suheb, when using delta tables with databricks, whenever you use proper liquid clustering indexes or partitions, you should get a good performance in comparison to relational engines to deal with big data volumes.However, you can also separate tab...

  • 1 kudos
Dom1
by New Contributor III
  • 123 Views
  • 2 replies
  • 2 kudos

Pull JAR from private Maven repository (Azure Artifactory)

Hi,I currently struggle on the following task:We want to push our code to a private repository (Azure Artifactory) and then pull it from databricks when the job runs. It currently works only with wheels inside a PyPi repo in the artifactory. I found ...

  • 123 Views
  • 2 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Databricks can install Maven libraries by coordinate and lets you point at a custom repository URL. However, passing credentials for authenticated private Maven repositories directly through the Libraries UI/Jobs is not natively supported today and r...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels