Data Engineering

Forum Posts

Sorted by:

by Shivani_Komma99 • Visitor

11 hours ago

97 Views
4 replies
1 kudos

Unable to see a folder in DBFS

Hi Team,We have a few scripts stored in a folder on a DBFS path. Recently, we've noticed that when we navigate to this path manually, the folder appears to be empty, and we are unable to see the scripts.However, the jobs that reference and access the...

Data Engineering

97 Views
4 replies
1 kudos

11 hours ago

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

7 hours ago

1 kudos

Hi @Shivani_Komma99, Thanks for flagging this. Based on the behaviour you described, this appears to be consistent with a DBFS browser UI issue/regression rather than a problem with the underlying files, especially since the files are still accessibl...

1 kudos

7 hours ago

3 More Replies

by aditi_mokashi • Visitor

yesterday

43 Views
0 replies
0 kudos

Urgent: Installing Lakebridge on Databricks

Hi,I want to install Databricks Lakebridge on my Databricks environment and use the analyze and transpile commands through a python script.The usecase is that we need to create an automated pipeline that will migrate the existing scripts from snowfla...

Data Engineering

43 Views
0 replies
0 kudos

yesterday

by AmitDECopilot • New Contributor III

yesterday

219 Views
0 replies
1 kudos

How would you design a Spark pipeline to process billions of records efficiently?

Interview Question:Many people start with the row count.I would start with the architecture.Billions of records are not new in enterprise data engineering. The real challenge is designing a pipeline that runs predictably, efficiently, and within SLA....

Data Engineering

219 Views
0 replies
1 kudos

yesterday

by darek554 • New Contributor

yesterday

110 Views
1 replies
0 kudos

Code on cluster runs idefinitely

Hello.Ive created a custom cluster - m4.large. When i try to execute some code in this cluster the behaviour is as follows:- Cluster starts, have running status- I run code, for example print("Hello")- Code runs indefinitely- I click interrupt, it st...

Data Engineering

110 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Yogasathyandrun
New Contributor II

yesterday

0 kudos

The fact that print("Hello") eventually works but SELECT 1 never completes suggests the cluster may be running but not fully initialized for Spark workloads.A few things I’d check first:Cluster Event Log for any provisioning or startup errors.Spark U...

0 kudos

yesterday

by DazzaiDe • New Contributor III

Saturday

127 Views
1 replies
0 kudos

DAB best practices suggestion

We're currently setting up Databricks Asset Bundles (DAB) with a CI/CD pipeline using Azure DevOps.Our planned development workflow is as follows:Main branch → Developer creates a feature branch → Implement changes → Create a Pull Request → Senior de...

Data Engineering

127 Views
1 replies
0 kudos

Saturday

View Replies

Latest Reply

balajij8
Contributor III

Saturday

0 kudos

You can create Databricks Asset Bundles that are decoupled by domain, managed via multi target declarations within configuration and also driven by immutable, versioned artifacts stored securely within Unity Catalog Volumes. You can rely on explicit ...

0 kudos

Saturday

by animeshjain • New Contributor

Saturday

209 Views
3 replies
0 kudos

Bundle deployment overwrites artifacts while jobs are running - best practices?

Hi everyone,I'm using #Declarative Automation Bundles (DAB) to deploy data pipelines, and I've run into an issue with concurrent job runs and deploymentWhat happened:I started a job that depends on a wheel file built by the bundle (timestamped artifa...

Data Engineering

209 Views
3 replies
0 kudos

Saturday

View Replies

Latest Reply

sudhaktr
New Contributor II

Saturday

0 kudos

Do you have source_linked_deployment set as false? That's probably causing it.

0 kudos

Saturday

2 More Replies

by VikasM • New Contributor

Thursday

600 Views
14 replies
6 kudos

Resolved! PySpark AnalysisException: Ambiguous reference to field t when parsing nested JSON

I'm working on a personal data engineering project using Kafka, Spark Structured Streaming, and Docker.The application consumes two Kafka topics that originate from an external market-data websocket source:a trade streama candlestick (kline/OHLCV) st...

Data Engineering

600 Views
14 replies
6 kudos

Thursday

View Replies

Latest Reply

balajij8
Contributor III

Friday

6 kudos

Spark Structured Streaming writes to file sinks and generally it uses a phased commit by writing temporary files to the output directory followed by writing metadata with references and a final commit by moving/renaming temp files to final names. You...

6 kudos

Friday

13 More Replies

by GabeMatch • New Contributor

Friday

133 Views
0 replies
0 kudos

Lakeflow connect Native connectors (tik, meta ads, Google Ads) - one table per account

We want to leverage these connectors to pull in marketing spend data. But the docs seem to say that the destination must be unique based on accounts. For Tik, we have a hundred accounts... each account will have a destination table for each object. ...

Data Engineering

133 Views
0 replies
0 kudos

Friday

by AustinBen • New Contributor

Friday

88 Views
0 replies
0 kudos

Streaming Amazon DocumentDB to Databricks in near real time - what's the best approach?

Hi everyone,I'm looking for advice from anyone who has implemented near real-time ingestion from Amazon DocumentDB into Databricks.Our current architecture is:Application → Amazon DocumentDBPython AWS Lambda functions capture changes from DocumentDBL...

Data Engineering

88 Views
0 replies
0 kudos

Friday

by Rupa0503 • New Contributor III

Friday

128 Views
2 replies
1 kudos

Implementing Row Level Security using ABAC

I have to implement row level Security to single/multiple tables based on roles and we don't want to create separate copies for users this one how can i implement and what is the code i can use?

Data Engineering

128 Views
2 replies
1 kudos

Friday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Friday

1 kudos

Hi @Rupa0503 , Yes, you can do row-level security across one table or many in Unity Catalog without copying data per role. @balajij8 pointed you in the right architectural direction (ABAC with governed tags, a reusable row-filter function, and centr...

1 kudos

Friday

1 More Replies

by gaurang033 • New Contributor II

03-15-2026 4:43:49 PM

2017 Views
3 replies
2 kudos

how to access snapshots in iceberg tables?

I have created an iceberg tables in databricks, and inserted bunch of values in it. how do I list the snapshot and other metadata of the tables. create table raw.landing.emp_ice(id int, name string ) using icebergfollowing doesn't work https://iceber...

Data Engineering

2017 Views
3 replies
2 kudos

03-15-2026 4:43:49 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Friday

2 kudos

@gaurang033 , I believe my solution gets you going in the right direction. Please give it a read and let me know. Cheers, Louis.

2 kudos

Friday

2 More Replies

by Félix_banqi • New Contributor

Thursday

155 Views
3 replies
0 kudos

Is there a way to deactivate genie auto corretion

Genie keeps breaking my code, sometimes making almost impossible to write code.Sometimes it behaves in a normal way, but sometimes it auto correts at every moment, with non wanted code.There is any way to fix it? I know its a bug, but i also dont kno...

Data Engineering

155 Views
3 replies
0 kudos

Thursday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

Friday

0 kudos

Hi @Félix_banqi, Sorry you are facing this issue. That definitely doesn’t sound like the intended experience. I would like to understand the issue better to give you a better steer. Is there an example you can share? In the meantime, given that you h...

0 kudos

Friday

2 More Replies

by Dhivyadharshini • New Contributor II

Thursday

200 Views
2 replies
2 kudos

Spark UI Troubleshooting: Data Skew vs Cluster Resource Bottlenecks

How can Spark UI metrics be used to distinguish data skew from insufficient cluster resources?When a Databricks job is slow, we usually look at Spark UI metrics such as task duration, shuffle read/write, spilled bytes, GC time, executor CPU utilizati...

Data Engineering

200 Views
2 replies
2 kudos

Thursday

View Replies

Latest Reply

Vibiksha
New Contributor II

Friday

2 kudos

A simple way to troubleshoot a slow Spark job using Spark UI is:Check task durationA few very slow tasks → Likely data skew.Most tasks are slow → Likely cluster resource or execution issue.Check Spark UI metricsLarge differences in shuffle read/task ...

2 kudos

Friday

1 More Replies

by Pratikmsbsvm • Contributor

02-08-2026 7:53:53 AM

5234 Views
2 replies
1 kudos

Data Migration from SAP S/4HANA to Databricks

May someone please help me designing the Migration of SAP S/4 HANA to Databricks. How to design this. what all we need to consider as LLD.1. How Data needs to be extracted and by which tool ? near–real-time replication is required2. Each layer for Da...

Data Engineering

5234 Views
2 replies
1 kudos

02-08-2026 7:53:53 AM

View Replies

Latest Reply

SteveOstrowski
Databricks Employee

Thursday

1 kudos

Hi @Pratikmsbsvm, Here is an updated view of the options for moving SAP S/4HANA data into Databricks, including the SAP and Databricks partnership path that is now the recommended low-friction approach. I will cover the integration options first, the...

1 kudos

Thursday

1 More Replies

by nsiddamsetti • New Contributor

Thursday

92 Views
0 replies
0 kudos

Databricks Data Engineer Professional Exam

Hi Guys, I am going write a Databricks Data Engineer Professional Exam, so I need some guidance who already wrote the exam. If anyone who did recently. Kindly approach me. Thanks.

Data Engineering

92 Views
0 replies
0 kudos

Thursday

Databricks Community

Forum Posts

Unable to see a folder in DBFS

Urgent: Installing Lakebridge on Databricks

How would you design a Spark pipeline to process billions of records efficiently?

Code on cluster runs idefinitely

DAB best practices suggestion

Bundle deployment overwrites artifacts while jobs are running - best practices?

Resolved! PySpark AnalysisException: Ambiguous reference to field t when parsing nested JSON

Lakeflow connect Native connectors (tik, meta ads, Google Ads) - one table per account

Streaming Amazon DocumentDB to Databricks in near real time - what's the best approach?

Implementing Row Level Security using ABAC

how to access snapshots in iceberg tables?

Is there a way to deactivate genie auto corretion

Spark UI Troubleshooting: Data Skew vs Cluster Resource Bottlenecks

Data Migration from SAP S/4HANA to Databricks

Databricks Data Engineer Professional Exam

Legacy Modernization Isn’t a Technology Problem

PySpark AnalysisException: Ambiguous reference to ...

Managing IPYNB cell timestamps in source control

How to change a field when instancing cluster defi...

Auto CDC Delete Propagation Issue: Streaming CDF R...