Data Engineering

Forum Posts

Sorted by:

by Johan_Van_Noten • New Contributor III

Tuesday

117 Views
3 replies
2 kudos

Long-running Python http POST hangs

As one of the steps in my data engineering pipeline, I need to perform a POST request to a http (not -s) server.This all works fine, except for the situation described below: it then hangs indefinitely.Environment:Azure Databricks Runtime 13.3 LTSPyt...

Data Engineering

117 Views
3 replies
2 kudos

Tuesday

View Replies

Latest Reply

siva-anantha
New Contributor III

6 hours ago

2 kudos

Hello,IMHO, having a HTTP related task in a Spark cluster is an anti-pattern. This kind of code executes at the Driver, it will be synchronous and adds overhead. This is one of the reasons, DLT (or SDP - Spark Declarative Pipeline) does not have REST...

2 kudos

6 hours ago

2 More Replies

by adhi_databricks • Contributor

Friday

264 Views
2 replies
0 kudos

Resolved! How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

Hi everyone,I’m trying to set up a smooth local-development workflow for Databricks and would love to hear how others are doing it.My Current SetupI do most of my development in Cursor (VS Code-based editor) because the AI agents make coding much fas...

Data Engineering

264 Views
2 replies
0 kudos

Friday

View Replies

Latest Reply

siva-anantha
New Contributor III

7 hours ago

0 kudos

@adhi_databricks: I want to add my perspective when it comes to pure local development (without Databricks connect).I wanted to setup a local development environment without connecting to Databricks workspace/cloud storage; develop PySpark code in VS...

0 kudos

7 hours ago

1 More Replies

by a_user12 • New Contributor III

15 hours ago

31 Views
2 replies
1 kudos

Declarative Pipelines: set Merge Schema to False

Dear Team!I want to prevent at a certain table that the schema is automatically updated. With plain strucutred streaming I can do the following:silver_df.writeStream \ .format("delta") \ .option("mergeSchema", "false") \ .option("checkpoi...

Data Engineering

31 Views
2 replies
1 kudos

15 hours ago

View Replies

Latest Reply

a_user12
New Contributor III

8 hours ago

1 kudos

@szymon_dybczak - thank you for your response I try: @dlt.table( name="deserialized", comment="Raw messages from Kafka topic as JSON", table_properties={ "pipelines.autoOptimize.managed": "true", "pipelines.autoCompact.m...

1 kudos

8 hours ago

1 More Replies

by Swathik • New Contributor II

18 hours ago

28 Views
1 replies
0 kudos

Best practices for the meta data driven ETL framework

I am designing a metadata‑driven ETL framework to migrate approximately 500 tables from Db2 to PostgreSQL.After reviewing multiple design patterns and blog posts, I am uncertain about the recommended approach for storing ETL metadata such as source s...

Data Engineering

28 Views
1 replies
0 kudos

18 hours ago

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

11 hours ago

0 kudos

Hi @Swathik ,If we're talking about where to store ETL metadata, in my opinion it's mostly a matter of preference. In my case, I prefer storing my config in YAML files, but I’ve also worked on projects where the config was stored in Delta tables.For ...

0 kudos

11 hours ago

by panshi1225 • Visitor

14 hours ago

21 Views
0 replies
0 kudos

exam got suspended in between

@DataBricks please help with my databricks engineer associate exam that got suspended on 30th Nov '25 in between , while i was giving exam following the protocols . I was more than 50% done with my test .Please reschedule my exam . I was following al...

Data Engineering

21 Views
0 replies
0 kudos

14 hours ago

by DatabricksUser5 • New Contributor

Friday

64 Views
1 replies
0 kudos

Reset committed offset of spark streaming to capture missed data

I have a very straightforward setup between Azure Eventhub and DLT using the kafka endpoint through spark streaming.There were network issues and the stream didn't pick up some event, but still progressed (and committed) the offset for some reasonAs ...

Data Engineering

dlt spark eventhub kafka azure

64 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

yesterday

0 kudos

Hello @DatabricksUser5 , You can’t override committed offsets in-place for a running DLT Kafka/Event Hubs stream. If a pipeline already has a checkpoint created, startingOffsets is ignored. To replay data, you must reset the streaming checkpoints or ...

0 kudos

yesterday

by excavator-matt • Contributor

Thursday

75 Views
2 replies
0 kudos

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

Hi!We're using Spark Lakeflow Declarative Pipelines for ingesting data from various data sources. However, in order to achieve compliance with GDPR, we are planning to start using ABAC tagging.However, I don't understand how we are supposed to use th...

Data Engineering

abac

LakeFlow

Streaming tables

remote_query() is not working

I am trying to experiment with remote_query() function according to the documentation. The feature is in public preview, so I assume it should be available to everyone now.select * from remote_query( 'my_connection', database => 'mydb', dbtable...

Data Engineering

248 Views
10 replies
2 kudos

Monday

View Replies

Latest Reply

vr
Contributor III

Friday

2 kudos

I have the same error if I query SELECT 1 from remote_query(). From documentation:> To use the remote_query function, you first need to create a Unity Catalog connectionSo, not sure why it rebels against creators

2 kudos

Friday

9 More Replies

by mdungey • New Contributor II

Friday

71 Views
3 replies
0 kudos

Deleting Lakeflow pipelines impact on objects within.

I've seen hidden in some forums that Databricks are working on a fix so that when you delete a LDP pipeline it doesn't delete the underlying objects(streaming tables, mat views etc..). Can anyone from an official source confirm this and maybe give s...

Data Engineering

71 Views
3 replies
0 kudos

Friday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

Friday

0 kudos

yes, I would take that as a pinch of salt

0 kudos

Friday

2 More Replies

by SRJDB • New Contributor

Friday

53 Views
1 replies
0 kudos

How to restrict the values permitted in a job or task parameter?

Hi, apologies if this is a daft question - I'm relatively new to Databricks and still finding my feet!I have a notebook with a parameter set within it via a widget, like this:dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"]) my_variable = d...

Data Engineering

53 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

Friday

0 kudos

Databricks Job/Task parameter interface does not provide a built-in UI feature to restrict the possible values entered by user. Yoou can add a runtime validation code inside the notebook to allow/fail based on the values entered in.

0 kudos

Friday

by mafzal669 • New Contributor

Friday

47 Views
1 replies
0 kudos

Admin user creation

Hi,I have created an azure account using my personal email id. I want to create this email id as Group Id in databricks admin console. But when I am adding a new user it says the user with this email id already exist. Could someone please help. I use...

Data Engineering

47 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

Friday

0 kudos

As the User IDs and Group IDs share the same namespace in Databricks you cannot create a Group with the same email address that is already registered as a User in your Databricks account.You better rename the group.

0 kudos

Friday

by suchitpathak08 • New Contributor

Thursday

68 Views
3 replies
0 kudos

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

Hi everyone,I’m running into two blocking issues while trying to run a Delta Live Tables (DLT) pipeline on Databricks (Azure). I’m hoping someone can help me understand what’s going wrong.1. Unity Catalog cannot access underlying ADLS storageEvery DL...

Data Engineering

68 Views
3 replies
0 kudos

Thursday

View Replies

Latest Reply

bianca_unifeye
New Contributor III

Friday

0 kudos

DLT pipelines always spin up job compute, and Azure is strict about SKU availability per region & per subscription. Most common causes Quota for that VM family is set to 2 vCPUsDatabricks shows: “Estimated available: 2” “QuotaExceeded” The SKU exists...

0 kudos

Friday

2 More Replies

by Mathew-Vesely • New Contributor

Thursday

67 Views
2 replies
0 kudos

Archive of legacy system into Databricks with structure and semi-structured data

We are currently exploring using Data Bricks to store and archive data from a legacy syste. The governance features of Unity Catalogue will give us the required capabilities to ensure we meet our legal, statutory and policy requirements for data rete...

Data Engineering

67 Views
2 replies
0 kudos

Thursday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

Friday

0 kudos

Classic 360 Custoemr View case and Databricks is certainly the right platform to do so.Strcutred Data - Stores in the Delta TablesEmail and PDFs - stored in Volumes, however, metadata as path to the volumes stored in delta table against customer-idIn...

0 kudos

Friday

1 More Replies

by Suheb • New Contributor III

Thursday

55 Views
1 replies
1 kudos

How can I efficiently archive old data in Delta tables without slowing queries?

How can I remove or move older rows from my main Delta table so that queries on recent data are faster, while still keeping access to the historical data if needed?

Data Engineering

55 Views
1 replies
1 kudos

Thursday

View Replies

Latest Reply

Coffee77
Contributor III

Friday

1 kudos

Hi Suheb, when using delta tables with databricks, whenever you use proper liquid clustering indexes or partitions, you should get a good performance in comparison to relational engines to deal with big data volumes.However, you can also separate tab...

1 kudos

Friday

by Dom1 • New Contributor III

Wednesday

123 Views
2 replies
2 kudos

Pull JAR from private Maven repository (Azure Artifactory)

Hi,I currently struggle on the following task:We want to push our code to a private repository (Azure Artifactory) and then pull it from databricks when the job runs. It currently works only with wheels inside a PyPi repo in the artifactory. I found ...

Data Engineering

123 Views
2 replies
2 kudos

Wednesday

View Replies

Latest Reply

iyashk-DB
Databricks Employee

Wednesday

2 kudos

Databricks can install Maven libraries by coordinate and lets you point at a custom repository URL. However, passing credentials for authenticated private Maven repositories directly through the Libraries UI/Jobs is not natively supported today and r...

2 kudos

Wednesday

1 More Replies

Databricks Community

Forum Posts

Long-running Python http POST hangs

Resolved! How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

Declarative Pipelines: set Merge Schema to False

Best practices for the meta data driven ETL framework

exam got suspended in between

Reset committed offset of spark streaming to capture missed data

ABAC tag support for for Streaming tables (Spark Lakeflow Declarative Pipelines)?

remote_query() is not working

Deleting Lakeflow pipelines impact on objects within.

How to restrict the values permitted in a job or task parameter?

Admin user creation

Urgent Assistance Needed – Unity Catalog Storage Access Failure & VM SKU Availability (Databricks on

Archive of legacy system into Databricks with structure and semi-structured data

How can I efficiently archive old data in Delta tables without slowing queries?

Pull JAR from private Maven repository (Azure Artifactory)

Join Us as a Local Community Builder!

List workspace permissions should return identity

How Are You Using Local IDEs (VS Code / Cursor/ Wh...

What strategies have you found most effective for ...

What are best practices for designing a large-scal...

How do I choose between a standard cluster and a s...