Data Engineering

Forum Posts

Sorted by:

by africke • Visitor

34m ago

4 Views
0 replies
0 kudos

Cannot view nested MLflow experiment runs without changing URL

Hello,I've recently been testing out Databricks experiments for a project of mine. I wanted to nest runs, and then see these runs grouped by their parent in the experiments UI. For the longest time, I couldn't figure out how to do this. I was seeing ...

Data Engineering

4 Views
0 replies
0 kudos

34m ago

by Charansai • New Contributor III

5 hours ago

20 Views
1 replies
0 kudos

Pipelines not included in Databricks Asset Bundles deployment

Hi all,I’m working with Databricks Asset Bundles (DAB) to build and deploy Jobs and pipelines across multiple environments in Azure Databricks.I can successfully deploy Jobs using bundles.However, when I try to deploy pipelines, I notice that the bun...

Data Engineering

20 Views
1 replies
0 kudos

5 hours ago

View Replies

Latest Reply

cdn_yyz_yul
New Contributor III

2 hours ago

0 kudos

This example helped me to deploy ETL pipelines as tasks in jobs to different workspaces.bundle-examples/lakeflow_pipelines_python at main · databricks/bundle-examples · GitHub

0 kudos

2 hours ago

by Brahmareddy • Esteemed Contributor

yesterday

92 Views
2 replies
5 kudos

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

As a data engineer deeply passionate about how data and AI can come together to create real-world impact, I’m excited to share my project for the Databricks Free Edition Hackathon 2025 — Future of Movie Discovery (FMD). Built entirely on Databricks F...

Data Engineering

92 Views
2 replies
5 kudos

yesterday

View Replies

Latest Reply

hasnat_unifeye
New Contributor

4 hours ago

5 kudos

Hi @Brahmareddy ,Really enjoyed your hackathon demo. you’ve set a high bar for NLP-focused projects. I picked up a lot from your approach and it’s definitely given me ideas to try out.For my hackathon entry, I took a similar direction using pyspark.m...

5 kudos

4 hours ago

1 More Replies

by Hubert-Dudek • Esteemed Contributor III

11-17-2021 6:16:14 AM

25511 Views
14 replies
12 kudos

Resolved! dbutils or other magic way to get notebook name or cell title inside notebook cell

Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier

Data Engineering

25511 Views
14 replies
12 kudos

11-17-2021 6:16:14 AM

View Replies

Latest Reply

rtullis
New Contributor II

11-21-2024 12:22:16 PM

12 kudos

I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A? I get the notebook B's name when...

12 kudos

11-21-2024 12:22:16 PM

13 More Replies

by kahrees • New Contributor

Tuesday

70 Views
3 replies
3 kudos

DATA_SOURCE_NOT_FOUND Error with MongoDB (Suggestions in other similar posts have not worked)

I am trying to load data from MongoDB into Spark. I am using the Community/Free version of DataBricks so my Jupiter Notebook is in a Chrome browser.Here is my code:from pyspark.sql import SparkSession spark = SparkSession.builder \ .config("spar...

Data Engineering

70 Views
3 replies
3 kudos

Tuesday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Tuesday

3 kudos

Hey @kahrees , Good Day! I tested this internally, and I was able to reproduce the issue. Screenshot below: You’re getting [DATA_SOURCE_NOT_FOUND] ... mongodb because the MongoDB Spark connector jar isn’t actually on your cluster’s classpath. On D...

3 kudos

Tuesday

2 More Replies

by eyalholzmann • New Contributor

Sunday

98 Views
3 replies
1 kudos

Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?

I'm working with Delta tables using the Iceberg Uniform feature to enable Iceberg-compatible reads. I’m trying to understand how metadata cleanup works in this setup.Specifically, does the VACUUM operation—which removes old Delta Lake metadata based ...

Data Engineering

98 Views
3 replies
1 kudos

Sunday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

7 hours ago

1 kudos

Here’s how to approach cleaning and maintaining Apache Iceberg metadata on Databricks, and how it differs from Delta workflows. First, know your table type For Unity Catalog–managed Iceberg tables, Databricks runs table maintenance for you (predicti...

1 kudos

7 hours ago

2 More Replies

by pooja_bhumandla • New Contributor III

9 hours ago

41 Views
1 replies
0 kudos

Should I enable Liquid Clustering based on table size distribution?

Hi everyone,I’m evaluating whether Liquid Clustering would be beneficial for the tables based on the sizes. Below is the size distribution of tables in my environment:Size Bucket Table Count Large (> 1 TB)3Medium (10 GB – 1 TB)284Small (< 10 GB)17,26...

Data Engineering

41 Views
1 replies
0 kudos

9 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

7 hours ago

0 kudos

Greetings @pooja_bhumandla Based on your size distribution, enabling Liquid Clustering can provide meaningful gains—but you’ll get the highest ROI by prioritizing your medium and large tables first and selectively applying it to small tables where q...

0 kudos

7 hours ago

by Naveenkumar1811 • New Contributor

8 hours ago

24 Views
1 replies
0 kudos

Can we Change the ownership of Databricks Managed Secret to SP in Azure Data Bricks?

Hi Team,Earlier we faced an Issue where the jar file(Created by a old employee) in workspace directory is used as library in the cluster which is run from a SP. Since the employee left the org and the id got removed even though the SP is part of ADMI...

Data Engineering

24 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

Coffee77
Contributor III

7 hours ago

0 kudos

That's the reason by which I try to deploy most part of resources with service principal accounts while using Databricks Asset Bundles. Avoid human identities whenever possible because they can indeed go away...I think you'll have to create another s...

0 kudos

7 hours ago

by bidek56 • Contributor

2 weeks ago

199 Views
5 replies
1 kudos

Resolved! Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

Data Engineering

199 Views
5 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

mark_ott
Databricks Employee

8 hours ago

1 kudos

Here's some solutions without using DBFS.. Yes, there are solutions for using the Spark scheduler allocation file on Databricks without DBFS, but options are limited and depend on your environment and access controls. Alternatives to DBFS for Schedu...

1 kudos

8 hours ago

4 More Replies

by Yuki • Contributor

13 hours ago

100 Views
4 replies
1 kudos

Is there any way to run jobs from github actions and catch the results?

Hi all,Is there any way to run jobs from github actions and catch the results?Of course, I can do this if I use the API or CLI.But I found the actions for notebook: https://github.com/marketplace/actions/run-databricks-notebook Compared to this, wri...

Data Engineering

100 Views
4 replies
1 kudos

13 hours ago

View Replies

Latest Reply

Yuki
Contributor

8 hours ago

1 kudos

OK, thank you for your advices, I will consider to use asset bundles for this.

1 kudos

8 hours ago

3 More Replies

by Naveenkumar1811 • New Contributor

yesterday

90 Views
2 replies
0 kudos

What is the Best Practice of Maintaining the Delta table loaded in Streaming?

Hi Team,We have our Bronze(append) Silver(append) and Gold(merge) Tables loaded using spark streaming continuously with trigger as processing time(3 secs).We Also Run our Maintenance Job on the Table like OPTIMIZE,VACCUM and we perform DELETE for som...

Data Engineering

90 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Naveenkumar1811
New Contributor

9 hours ago

0 kudos

Hi Mark,But the real problem is our streaming job runs 365 days 24 *7 and we cant afford any further latency to our data flowing to gold layer. We don't have any window to pause or slower our streaming and we continuously get the data feed actually s...

0 kudos

9 hours ago

1 More Replies

by hidden • New Contributor II

10 hours ago

44 Views
1 replies
0 kudos

DLT PARAMETERIZATION FROM JOBS PARAMETERS

I have created a dlt pipeline notebook which creates tables based on a config file that has the configuration of the tables that need to be created . now what i want is i want to run my pipeline every 30 min for 4 tables from config and every 3 hours...

Data Engineering

44 Views
1 replies
0 kudos

10 hours ago

View Replies

Latest Reply

Coffee77
Contributor III

9 hours ago

0 kudos

Define "parameters" in job as usual and then, try to capture them in DLT by using similar code to this one:dlt.conf.get("PARAMETER_NAME", "PARAMETER_DEFAULT_VALUE")It should get parameter values from job if value exists, otherwise it'll set the defau...

0 kudos

9 hours ago

by santosh_bhosale • Visitor

11 hours ago

38 Views
2 replies
0 kudos

Issue with Unity Catlog on Azure

when I create Databricks workspace on Azure and tries to login on https://accounts.azuredatabricks.net/ it redirects to my workspace. Where as on Azure subscription I am the owner, I created this azure subscription and Databricks workspace is also cr...

Data Engineering

38 Views
2 replies
0 kudos

11 hours ago

View Replies

Latest Reply

Coffee77
Contributor III

9 hours ago

0 kudos

Clearly, you don't have "admin account" permissions. Try to click in the workspace drop-down and then, check if you can see and click in "Manage Account" to confirm BUT it will be very likely you are not allowed to access.You must be Azure Global Adm...

0 kudos

9 hours ago

1 More Replies

by leenack • New Contributor

Tuesday

203 Views
7 replies
2 kudos

No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

I created a simple Databricks procedure that should return a single value."SELECT 1 AS result;"When I call this procedure from my .NET API using ExecuteReader, ExecuteAdapter, or ExecuteScalar, the call completes without any errors, but no rows are r...

Data Engineering

203 Views
7 replies
2 kudos

Tuesday

View Replies

Latest Reply

Coffee77
Contributor III

10 hours ago

2 kudos

So, @leenack best option so far is to refactor part of your code from stored procedures to functions, specifically the part of querying data. Exactly the same, I proposed in previous comments. Thanks @matt for your response.

2 kudos

10 hours ago

6 More Replies

by Allen123Maria_1 • New Contributor

Tuesday

97 Views
2 replies
0 kudos

Optimizing Azure Functions for Performance and Cost with Variable Workloads

Hey, everyone!!I use Azure Functions in a project where the workloads change a lot. Sometimes it's quiet, and other times we get a lot of traffic.Azure Functions is very scalable, but I've had some trouble with cold starts and keeping costs down.I'm ...

Data Engineering

97 Views
2 replies
0 kudos

Tuesday

View Replies

Latest Reply

susanrobert3
Visitor

10 hours ago

0 kudos

Hey!!!Cold starts on Azure Functions Premium can still bite if your instances go idle long enough — even with pre-warmed instances.What usually helps is bumping the `preWarmedInstanceCount` to at least 1 per plan (so there’s always a warm worker), an...

0 kudos

10 hours ago

1 More Replies

Databricks Community

Forum Posts

Cannot view nested MLflow experiment runs without changing URL

Pipelines not included in Databricks Asset Bundles deployment

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

Resolved! dbutils or other magic way to get notebook name or cell title inside notebook cell

DATA_SOURCE_NOT_FOUND Error with MongoDB (Suggestions in other similar posts have not worked)

Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?

Should I enable Liquid Clustering based on table size distribution?

Can we Change the ownership of Databricks Managed Secret to SP in Azure Data Bricks?

Resolved! Location of spark.scheduler.allocation.file

Is there any way to run jobs from github actions and catch the results?

What is the Best Practice of Maintaining the Delta table loaded in Streaming?

DLT PARAMETERIZATION FROM JOBS PARAMETERS

Issue with Unity Catlog on Azure

No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

Optimizing Azure Functions for Performance and Cost with Variable Workloads

Join Us as a Local Community Builder!

Location of spark.scheduler.allocation.file

DQX usage outside Databricks

Serverless Compute - Spark - Jobs failing with Max...

MERGE operation not performing data skipping with ...

Could not connect Self Hosted MySQL Database in Az...