Data Engineering

Forum Posts

Sorted by:

by RevanthV • New Contributor III

14 hours ago

55 Views
3 replies
0 kudos

POC on spark 4.x

I need to do some POC with spark 3.5.7 and 4.x and need some local setup with some sample Kafka source. The POC would read data from Kafka via streaming job and write to delta table and I would like to do this on spark-4.x ..Do you know of any quick ...

Data Engineering

55 Views
3 replies
0 kudos

14 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

11 hours ago

0 kudos

Hello @RevanthV , I did some digging and here are some helpful tips. Got it — here are fast, reproducible ways to stand up a local Kafka source and run a Spark Structured Streaming job that writes to a Delta table, plus the common fixes for the conne...

0 kudos

11 hours ago

2 More Replies

by Suheb • New Contributor II

10 hours ago

26 Views
1 replies
0 kudos

How do I choose between a standard cluster and a shared cluster in Databricks?

When should I use a single-user cluster, and when should I use a multi-user/shared cluster? What’s the difference and how do I pick the right one?

Data Engineering

26 Views
1 replies
0 kudos

10 hours ago

View Replies

Latest Reply

K_Anudeep
Databricks Employee

5 hours ago

0 kudos

Hello @Suheb , Use single-user (Dedicated) clusters when you need privileged machine access or features not supported on shared clusters. These clusters run under a single identity (user or group) and behave like a traditional Spark cluster with full...

0 kudos

5 hours ago

by Deenar • New Contributor II

02-26-2025 3:03:18 AM

613 Views
2 replies
1 kudos

Dashboard Filters (Showing Description) but selecting the id value for use query parameters

HiIs there a way to have a Single or multiple filter in a Dashboard that shows titles/names but on selection passes in the identifier as query parameter filters. The cost management dashboard shipped by Databricks seems to do this for the workspace s...

Data Engineering

613 Views
2 replies
1 kudos

02-26-2025 3:03:18 AM

View Replies

Latest Reply

200052
New Contributor II

6 hours ago

1 kudos

I wanted to share my solution to this, hope this helps. Create a SQL function that accepts an ARRAY<STRING>, extracts a numeric ID enclosed in parentheses from the end of each string, and returns an ARRAY<BIGINT>.Example UsageInput: ["Item (1)", "Ite...

1 kudos

6 hours ago

1 More Replies

by tnorlund • Visitor

7 hours ago

31 Views
1 replies
0 kudos

Serverless compute cannot access internet

I'm experiencing "Network is unreachable" errors when trying to access external APIs from serverless compute jobs, despite having a network policy configured to allow access to all destinations. I have the "Serverless egress control" in the account s...

Data Engineering

31 Views
1 replies
0 kudos

7 hours ago

View Replies

Latest Reply

tnorlund
Visitor

7 hours ago

0 kudos

I've run diagnostic tests and found: **DNS Resolution**: Partially working- `github.com` resolves successfully and HTTPS works- `pokeapi.co` fails DNS resolution- `google.com` fails DNS resolution- `databricks.com` fails DNS resolution **Outbound HT...

0 kudos

7 hours ago

by africke • New Contributor

Thursday

115 Views
3 replies
2 kudos

Resolved! Cannot view nested MLflow experiment runs without changing URL

Hello,I've recently been testing out Databricks experiments for a project of mine. I wanted to nest runs, and then see these runs grouped by their parent in the experiments UI. For the longest time, I couldn't figure out how to do this. I was seeing ...

Data Engineering

115 Views
3 replies
2 kudos

Thursday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

12 hours ago

2 kudos

@africke , If you’re happy with the results, please go ahead and accept this as the solution so others know it worked.

2 kudos

12 hours ago

2 More Replies

by kyeongmin_baek • New Contributor II

yesterday

76 Views
4 replies
1 kudos

Resolved! Got an empty query file when cloning Query file.

Hello Community,In our AWS Databricks environment, we’ve encountered some behavior we don’t understand while performing the following operation.When we clone a query file that already has content, a new file is created with the same name and “(clone)...

Data Engineering

76 Views
4 replies
1 kudos

yesterday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

15 hours ago

1 kudos

@kyeongmin_baek - There is no aut-save or CMD+S for Query as it get saved only once it is attached to a cluster AND 'SAVE' icon is used. However, it still stays in cache as unsaved in that 'query' window but cloning or other file operations may lose ...

1 kudos

15 hours ago

3 More Replies

by Shivaprasad • Contributor

yesterday

62 Views
2 replies
0 kudos

Resolved! Error while creating databricks custom app

I am trying to create a simple databricks custom app but I am getting Error: Could not import 'app'. error.app.yaml fileenv: - name: FLASK_APP value: '/Workspace/Users/sam@xxx.com/databricks_apps/hello-world_2025_11_13-16_19/Gaap_commentry/app'comm...

Data Engineering

62 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Shivaprasad
Contributor

15 hours ago

0 kudos

Thanks, I have modified the yaml file but still getting Error: Could not import 'app' errorenv: - name: FLASK_APP value: '/Workspace/Users/xxx@zzz.com/databricks_apps/hello-world_2025_11_13-16_19/Gaap_commentry'command: [ "flask", "--app", "...

0 kudos

15 hours ago

1 More Replies

by Escarigasco • New Contributor II

21 hours ago

41 Views
1 replies
1 kudos

Usage Dashboard from Cody Austin Davis displays only DBUs or overall cost including the VM uptime?

Hello,I have been looking at the new Dashboard created by @CodyA ( great job! ) and I was wondering if the cost displaying only provides visibility to only the databricks mark-up on each job ( i.e. $ DBUs ) or to the overall cost including cloud prov...

Data Engineering

41 Views
1 replies
1 kudos

21 hours ago

View Replies

Latest Reply

Raman_Unifeye
Contributor III

15 hours ago

1 kudos

As the dashboard uses System Tables (system.billing.usage) to show spend across Jobs, SQL, and Notebooks, I dont beleive it includes cloud provider VM costs.

1 kudos

15 hours ago

by jitendrajha11 • Visitor

19 hours ago

49 Views
4 replies
1 kudos

Want to see logs for lineage view run events

Hi All,I need your help, as I am running jobs it is getting successful, when I click on job and there we can find lineage > View run events option when click on it. I see below steps.Job Started: The job is triggered.Waiting for Cluster: The job wait...

Data Engineering

49 Views
4 replies
1 kudos

19 hours ago

View Replies

Latest Reply

jitendrajha11
Visitor

16 hours ago

1 kudos

Hi Team/Member,As I am running jobs it is getting successful, when I click on job and there we can find lineage > View run events option when click on it. We find below steps and also added screenshot of it. I want screenshot stages logs, where i wil...

1 kudos

16 hours ago

3 More Replies

by Sainath368 • Contributor

yesterday

97 Views
4 replies
4 kudos

Resolved! Autoloader Managed File events

Hi all,We are in the process of migrating from directory listing to managed file events in Azure Databricks. Our data is stored in an Azure Data Lake container with the following folder structure:To enable file events in Unity Catalog (UC), I created...

Data Engineering

97 Views
4 replies
4 kudos

yesterday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

21 hours ago

4 kudos

Recommended approach to continue your existing pattern:Keep the External Location enabled for file events at the high-level path (/Landing).Run a separate Structured Streaming job for each table, specifying the full sub-path in the .load() function (...

4 kudos

21 hours ago

3 More Replies

by smoortema • Contributor

yesterday

76 Views
3 replies
3 kudos

how to know which join type was used (broadcast, shuffle hash or sort merge join) for a query?

What is the best way to know what kind of join was used for a SQL query between broadcast, shuffle hash and sort merge? How can the spark UI or the query plan be interpreted?

Data Engineering

76 Views
3 replies
3 kudos

yesterday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

17 hours ago

3 kudos

@smoortema , Spark performance tuning is one of the hardest topics to teach or learn, and it’s even tougher to do justice to in a forum thread. That said, I’m really glad to see you asking the question. Tuning is challenging precisely because there a...

3 kudos

17 hours ago

2 More Replies

by austinoyoung • New Contributor III

06-18-2025 7:32:18 AM

1333 Views
7 replies
4 kudos

create an external connection to oracle

Hi! I've been trying to create an external connection to oracle but getting the following error message "Detailed error message: ORA-00604: error occurred at recursive SQL level 1 ORA-01882: timezone region not found" I searched online and found some...

Data Engineering

1333 Views
7 replies
4 kudos

06-18-2025 7:32:18 AM

View Replies

Latest Reply

TheOC
Contributor III

06-23-2025 5:41:29 AM

4 kudos

hey @austinoyoung ,I don't have an Oracle database to be able to test this for you, but I believe you can get around this error by following the steps laid out in here:https://stackoverflow.com/questions/9156379/ora-01882-timezone-region-not-foundIn ...

4 kudos

06-23-2025 5:41:29 AM

6 More Replies

by mits1 • New Contributor

yesterday

33 Views
2 replies
1 kudos

Resolved! Unable to navigate/login to Databricks Account Console

Hi,I have deployed Azure Databricks using email id (say xx@gmail.com) and able to lauch a workspace.When I try to access account console, it throws below errorSelected user account does not exist in tenant 'Microsoft Services' and cannot access the ...

Data Engineering

33 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

17 hours ago

1 kudos

And old link but still relevant - https://github.com/cloudboxacademy/azure_databricks_course/blob/main/known-issues/unable-to-login-to-azure-databricks-account-console.md

1 kudos

17 hours ago

1 More Replies

by mh2587 • New Contributor II

11-19-2024 5:47:14 AM

3787 Views
1 replies
1 kudos

Managing PCI-DSS Compliance and Access to Serverless Features in Azure Databricks

Hello Databricks CommunityI am currently using Azure Databricks with PCI-DSS compliance enabled in our workspace, as maintaining stringent security standards is crucial for our organization. However, I've discovered that once PCI-DSS compliance is tu...

Data Engineering

3787 Views
1 replies
1 kudos

11-19-2024 5:47:14 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

18 hours ago

1 kudos

Once PCI-DSS compliance is enabled in Azure Databricks, the workspace is locked into a set of restrictions to maintain those standards and safeguard sensitive data. These restrictions include disabling access to features like serverless compute, whic...

1 kudos

18 hours ago

by Zbyszek • New Contributor

Monday

60 Views
2 replies
1 kudos

Create a Hudi table with Databrick 17

Hi I'm trying to run my existing code which has worked on the older DB version.CREATE TABLE IF NOT EXISTS catalog.demo.ABTHudi USING org.apache.hudi.Spark3DefaultSource OPTIONS ('primaryKey' = 'ID','hoodie.table.name' = 'ABTHudi') AS SELECT * FROM pa...

Data Engineering

60 Views
2 replies
1 kudos

Monday

View Replies

Latest Reply

Zbyszek
New Contributor

18 hours ago

1 kudos

Thank You for your response, I will wait for more updates on that.RegardsZiggy

1 kudos

18 hours ago

1 More Replies

Databricks Community

Forum Posts

POC on spark 4.x

How do I choose between a standard cluster and a shared cluster in Databricks?

Dashboard Filters (Showing Description) but selecting the id value for use query parameters

Serverless compute cannot access internet

Resolved! Cannot view nested MLflow experiment runs without changing URL

Resolved! Got an empty query file when cloning Query file.

Resolved! Error while creating databricks custom app

Usage Dashboard from Cody Austin Davis displays only DBUs or overall cost including the VM uptime?

Want to see logs for lineage view run events

Resolved! Autoloader Managed File events

how to know which join type was used (broadcast, shuffle hash or sort merge join) for a query?

create an external connection to oracle

Resolved! Unable to navigate/login to Databricks Account Console

Managing PCI-DSS Compliance and Access to Serverless Features in Azure Databricks

Create a Hudi table with Databrick 17

Join Us as a Local Community Builder!

Error while creating databricks custom app

Cannot view nested MLflow experiment runs without ...

Autoloader Managed File events

Unable to navigate/login to Databricks Account Con...

Got an empty query file when cloning Query file.