cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Shivani_Komma99
by Visitor
  • 97 Views
  • 4 replies
  • 1 kudos

Unable to see a folder in DBFS

Hi Team,We have a few scripts stored in a folder on a DBFS path. Recently, we've noticed that when we navigate to this path manually, the folder appears to be empty, and we are unable to see the scripts.However, the jobs that reference and access the...

  • 97 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Shivani_Komma99, Thanks for flagging this. Based on the behaviour you described, this appears to be consistent with a DBFS browser UI issue/regression rather than a problem with the underlying files, especially since the files are still accessibl...

  • 1 kudos
3 More Replies
aditi_mokashi
by Visitor
  • 43 Views
  • 0 replies
  • 0 kudos

Urgent: Installing Lakebridge on Databricks

Hi,I want to install Databricks Lakebridge on my Databricks environment and use the analyze and transpile commands through a python script.The usecase is that we need to create an automated pipeline that will migrate the existing scripts from snowfla...

  • 43 Views
  • 0 replies
  • 0 kudos
darek554
by New Contributor
  • 110 Views
  • 1 replies
  • 0 kudos

Code on cluster runs idefinitely

Hello.Ive created a custom cluster - m4.large. When i try to execute some code in this cluster the behaviour is as follows:- Cluster starts, have running status- I run code, for example print("Hello")- Code runs indefinitely- I click interrupt, it st...

  • 110 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 0 kudos

The fact that print("Hello") eventually works but SELECT 1 never completes suggests the cluster may be running but not fully initialized for Spark workloads.A few things I’d check first:Cluster Event Log for any provisioning or startup errors.Spark U...

  • 0 kudos
DazzaiDe
by New Contributor III
  • 127 Views
  • 1 replies
  • 0 kudos

DAB best practices suggestion

We're currently setting up Databricks Asset Bundles (DAB) with a CI/CD pipeline using Azure DevOps.Our planned development workflow is as follows:Main branch → Developer creates a feature branch → Implement changes → Create a Pull Request → Senior de...

  • 127 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can create Databricks Asset Bundles that are decoupled by domain, managed via multi target declarations within configuration and also driven by immutable, versioned artifacts stored securely within Unity Catalog Volumes. You can rely on explicit ...

  • 0 kudos
animeshjain
by New Contributor
  • 209 Views
  • 3 replies
  • 0 kudos

Bundle deployment overwrites artifacts while jobs are running - best practices?

Hi everyone,I'm using #Declarative Automation Bundles (DAB) to deploy data pipelines, and I've run into an issue with concurrent job runs and deploymentWhat happened:I started a job that depends on a wheel file built by the bundle (timestamped artifa...

animeshjain_0-1782560608354.png
  • 209 Views
  • 3 replies
  • 0 kudos
Latest Reply
sudhaktr
New Contributor II
  • 0 kudos

Do you have source_linked_deployment set as false? That's probably causing it.

  • 0 kudos
2 More Replies
VikasM
by New Contributor
  • 600 Views
  • 14 replies
  • 6 kudos

Resolved! PySpark AnalysisException: Ambiguous reference to field t when parsing nested JSON

I'm working on a personal data engineering project using Kafka, Spark Structured Streaming, and Docker.The application consumes two Kafka topics that originate from an external market-data websocket source:a trade streama candlestick (kline/OHLCV) st...

  • 600 Views
  • 14 replies
  • 6 kudos
Latest Reply
balajij8
Contributor III
  • 6 kudos

Spark Structured Streaming writes to file sinks and generally it uses a phased commit by writing temporary files to the output directory followed by writing metadata with references and a final commit by moving/renaming temp files to final names. You...

  • 6 kudos
13 More Replies
Rupa0503
by New Contributor III
  • 128 Views
  • 2 replies
  • 1 kudos

Implementing Row Level Security using ABAC

I have to implement row level Security to single/multiple tables based on roles and we don't want to create separate copies for users this one how can i implement and what is the code i can use?

  • 128 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hi @Rupa0503 , Yes, you can do row-level security across one table or many in Unity Catalog without copying data per role. @balajij8  pointed you in the right architectural direction (ABAC with governed tags, a reusable row-filter function, and centr...

  • 1 kudos
1 More Replies
gaurang033
by New Contributor II
  • 2017 Views
  • 3 replies
  • 2 kudos

how to access snapshots in iceberg tables?

I have created an iceberg tables in databricks, and inserted bunch of values in it. how do I list the snapshot and other metadata of the tables. create table raw.landing.emp_ice(id int, name string ) using icebergfollowing doesn't work https://iceber...

  • 2017 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@gaurang033 , I believe my solution gets you going in the right direction.  Please give it a read and let me know.  Cheers, Louis.

  • 2 kudos
2 More Replies
Félix_banqi
by New Contributor
  • 155 Views
  • 3 replies
  • 0 kudos

Is there a way to deactivate genie auto corretion

Genie keeps breaking my code, sometimes making almost impossible to write code.Sometimes it behaves in a normal way, but sometimes it auto correts at every moment, with non wanted code.There is any way to fix it? I know its a bug, but i also dont kno...

  • 155 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Félix_banqi, Sorry you are facing this issue. That definitely doesn’t sound like the intended experience. I would like to understand the issue better to give you a better steer. Is there an example you can share? In the meantime, given that you h...

  • 0 kudos
2 More Replies
Dhivyadharshini
by New Contributor II
  • 200 Views
  • 2 replies
  • 2 kudos

Spark UI Troubleshooting: Data Skew vs Cluster Resource Bottlenecks

How can Spark UI metrics be used to distinguish data skew from insufficient cluster resources?When a Databricks job is slow, we usually look at Spark UI metrics such as task duration, shuffle read/write, spilled bytes, GC time, executor CPU utilizati...

  • 200 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vibiksha
New Contributor II
  • 2 kudos

A simple way to troubleshoot a slow Spark job using Spark UI is:Check task durationA few very slow tasks → Likely data skew.Most tasks are slow → Likely cluster resource or execution issue.Check Spark UI metricsLarge differences in shuffle read/task ...

  • 2 kudos
1 More Replies
Pratikmsbsvm
by Contributor
  • 5234 Views
  • 2 replies
  • 1 kudos

Data Migration from SAP S/4HANA to Databricks

May someone please help me designing the Migration of SAP S/4 HANA to Databricks. How to design this. what all we need to consider as LLD.1. How Data needs to be extracted and by which tool ? near–real-time replication is required2. Each layer for Da...

  • 5234 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Pratikmsbsvm, Here is an updated view of the options for moving SAP S/4HANA data into Databricks, including the SAP and Databricks partnership path that is now the recommended low-friction approach. I will cover the integration options first, the...

  • 1 kudos
1 More Replies
Labels