Data Engineering

Forum Posts

Sorted by:

by ShashiPrakash • New Contributor II

04-09-2025 12:55:05 AM

3470 Views
2 replies
1 kudos

Resolved! Unity Catalog Table in Databricks Asset Bundle

I am looking to deploy unity catalog schemas and tables via Databricks Asset Bundle (DAB). We can do schema evolution of tables via notebooks as well, but we already have 1000+ notebooks and implementing via notebooks will be an effort hence was look...

Data Engineering

3470 Views
2 replies
1 kudos

04-09-2025 12:55:05 AM

View Replies

Latest Reply

ShashiPrakash
New Contributor II

04-09-2025 2:33:47 AM

1 kudos

Thanks for the prompt response @saurabh18cs . Yes that was the alternating i was considering. I believe it will be the warehouses group command which will explore. Will you be able to share any best practice document to manage the SQL project file, w...

1 kudos

04-09-2025 2:33:47 AM

1 More Replies

by RobCox • New Contributor II

04-08-2025 8:29:29 AM

1061 Views
2 replies
0 kudos

DAB - Common cluster configs possible?

I've been trying various solutions and perhaps maybe just thinking about this the wrong way.We're migrating over from Synapse where we're used to have a defined set of DBX Cluster profiles to run our jobs against, these are all job clusters created v...

Data Engineering

1061 Views
2 replies
0 kudos

04-08-2025 8:29:29 AM

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

04-09-2025 1:14:41 AM

0 kudos

hi, you can also parametrize your job clusters ?? job_clusters: - job_cluster_key: Job_cluster new_cluster: spark_version: ${var.spark_version} spark_conf: ${var.spark_configuration} azure_attributes: ...

0 kudos

04-09-2025 1:14:41 AM

1 More Replies

by ShivangiB • New Contributor III

03-05-2025 9:47:12 PM

1186 Views
3 replies
0 kudos

Zorder and Liquid Clustering Performance while reading and writing data

when i am writing to a liquid clustering table it is taking more time compared to zorder

Data Engineering

1186 Views
3 replies
0 kudos

03-05-2025 9:47:12 PM

View Replies

Latest Reply

ShivangiB
New Contributor III

04-08-2025 11:21:15 PM

0 kudos

We are trying to understabnd the overall behavior of liquid clustering

0 kudos

04-08-2025 11:21:15 PM

2 More Replies

by DatabricksQuery • New Contributor

04-09-2025 1:03:45 AM

584 Views
1 replies
0 kudos

Databricks Job Listener Concept for Tracking Personal Jobs

Hello everyoneI want to know if any listener mechanism in Databricks can track the configuration of Databricks jobs deployed through CI/CD. With the help of this listener, we can track our custom jobs that are not part of the CI/CD process. This way,...

Data Engineering

584 Views
1 replies
0 kudos

04-09-2025 1:03:45 AM

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

04-09-2025 1:39:24 AM

0 kudos

Hi , I don't think Databricks provides a built-in listener mechanism to track changes to job configurations directly. However, you can implement a custom solution to monitor and track changes to Databricks jobs deployed through CI/CD pipelines using ...

0 kudos

04-09-2025 1:39:24 AM

by khishore • Contributor

10-30-2022 6:09:54 PM

6591 Views
9 replies
6 kudos

Resolved! i haven't received my certificate or the badge for Databricks Certified Data Engineer Associate

Hi @Lindsay Olson @Kaniz Fatma ,I have cleared my Databricks Certified Data Engineer Associate on 29 October 2022. but haven't received my badge or certificate yet .Can you guys please help .Thanks

Data Engineering

6591 Views
9 replies
6 kudos

10-30-2022 6:09:54 PM

View Replies

Latest Reply

gokul2
New Contributor III

03-19-2025 1:04:35 AM

6 kudos

Hi @Lindsay Olson @Kaniz Fatma ,I have cleared my Databricks Certified Data Engineer Associate on 01 December 2024.you have shared my certificate to this mail id (927716@congizant.com) on December 2 but my origination has blocked external sites, ki...

6 kudos

03-19-2025 1:04:35 AM

8 More Replies

by chethankumar • New Contributor III

11-21-2024 10:56:59 AM

3605 Views
4 replies
1 kudos

How to execute SQL statement using terraform

Is there a way to execute SQL statements using Terraform I can see it can be possible using API as bellow, https://docs.databricks.com/api/workspace/statementexecution/executestatementbut I want to know is a strength way to run like bellow code provi...

Data Engineering

3605 Views
4 replies
1 kudos

11-21-2024 10:56:59 AM

View Replies

Latest Reply

KartikeyaJain
New Contributor III

04-08-2025 9:40:44 PM

1 kudos

The official Databricks provider in Terraform only allows you to create SQL queries, not execute them. To actually run queries, you can either:Use the http provider to make API calls to the Databricks REST API to execute SQL queries.Alternatively, if...

1 kudos

04-08-2025 9:40:44 PM

3 More Replies

by naga93 • New Contributor

03-25-2025 12:24:07 AM

2118 Views
1 replies
0 kudos

How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio

Hello,I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.The ...

Data Engineering

2118 Views
1 replies
0 kudos

03-25-2025 12:24:07 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-08-2025 9:05:57 PM

0 kudos

Hi naga93,How are you doing today?, As per my understanding, you’ve done a great job navigating all the tricky parts of Delta + Unity Catalog + Dremio integration! You're absolutely right to set minReaderVersion to 2 and disable deletion vectors to m...

0 kudos

04-08-2025 9:05:57 PM

by surajitDE • Contributor

03-28-2025 11:16:39 PM

1442 Views
1 replies
0 kudos

How can we change from GC to G1GC in serverless

My DLT jobs are experiencing throttling due to the following error message:[GC (GCLocker Initiated GC) [PSYoungGen: 5431990K->102912K(5643264K)] 9035507K->3742053K(17431552K), 0.1463381 secs] [Times: user=0.29 sys=0.00, real=0.14 secs]I came across s...

Data Engineering

1442 Views
1 replies
0 kudos

03-28-2025 11:16:39 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-08-2025 9:01:50 PM

0 kudos

Hi surajitDE,How are you doing today?, As per my understanding, You're absolutely right to look into the GC (Garbage Collection) behavior—when you're seeing messages like GCLocker Initiated GC and frequent young gen collections, it usually means your...

0 kudos

04-08-2025 9:01:50 PM

by drag7ter • Contributor

04-08-2025 7:37:31 AM

1619 Views
3 replies
0 kudos

Overwriting delta table takes lot of time

I'm trying simply to overwrite data into delta table. The Table size is not really huge it has 50 Mil of rows and 1.9Gb in size.For running this code I use various cluster configurations starting from 1 node cluster 64Gb 16 Vcpu and also I tried to s...

Data Engineering

1619 Views
3 replies
0 kudos

04-08-2025 7:37:31 AM

View Replies

Latest Reply

thackman
Databricks Partner

04-08-2025 1:16:11 PM

0 kudos

1) You might need to cache the dataframe so it's not recomputing for the write2) What type of cloud storage are you using? We've noticed slow delta writes as well. We are using Azure standard storage which is backed by spinning disks. It's limited to...

0 kudos

04-08-2025 1:16:11 PM

2 More Replies

by PaoloF • New Contributor II

04-08-2025 4:26:15 AM

1735 Views
3 replies
0 kudos

Resolved! Re-Ingest Autoloader files foreachbatch

Hi all,I'm using autoloader to ingest files, each files contains changed data from a table and I merge it into a delta table. It works fine.But if i want re-ingest all the files (deleting the checkpoint location, at example) i need to re-ingest the f...

Data Engineering

1735 Views
3 replies
0 kudos

04-08-2025 4:26:15 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-08-2025 9:36:04 AM

0 kudos

Glad to help!

0 kudos

04-08-2025 9:36:04 AM

2 More Replies

by dkxxx-rc • Contributor

03-31-2025 1:21:30 PM

2309 Views
2 replies
1 kudos

Can't "run all below" - "command is part of a batch that is still running"

Weirdness in Databricks on AWS. In a notebook that is doing absolutely nothing, I click the "Run All Above" or "Run All Below" button on a cell, and it won't do anything at all except pop up a little message near the general "Run All" button, saying...

Data Engineering

2309 Views
2 replies
1 kudos

03-31-2025 1:21:30 PM

View Replies

Latest Reply

Advika
Community Manager

04-04-2025 5:35:42 AM

1 kudos

Hello @dkxxx-rc! Can you check if any background processes are still running in your notebook that might be interfering with new executions? If you are using Databricks Runtime 14.0 or above, cells run in batches, so any error halts execution, and in...

1 kudos

04-04-2025 5:35:42 AM

1 More Replies

by Prabakar • Databricks Employee

10-22-2021 3:30:57 PM

3590 Views
1 replies
2 kudos

Accessing the regions that are disabled by default in AWS from Databricks. In AWS we have 4 regions that are disabled by default. You must first enabl...

Accessing the regions that are disabled by default in AWS from Databricks.In AWS we have 4 regions that are disabled by default. You must first enable it before you can create and manage resources. The following Regions are disabled by default:Africa...

Data Engineering

3590 Views
1 replies
2 kudos

10-22-2021 3:30:57 PM

View Replies

Latest Reply

AndreaCuda
New Contributor II

04-08-2025 6:46:34 AM

2 kudos

Hello - We are looking to deploy and run Databricks in AWS in Bahrain, or UAE. Is this possible? This post is older so wondering if this is a viable option.

2 kudos

04-08-2025 6:46:34 AM

by JooseSauli • New Contributor II

03-28-2025 8:09:34 AM

2148 Views
3 replies
3 kudos

How to make .py files available for import?

Hello,I've looked around, but cannot find an answer. In my Azure Databricks workspace, users have Python notebooks which all make use of the same helper functions and classes. Instead of housing the helper code in notebooks and having %run magics in ...

Data Engineering

2148 Views
3 replies
3 kudos

03-28-2025 8:09:34 AM

View Replies

Latest Reply

JooseSauli
New Contributor II

04-08-2025 5:52:24 AM

3 kudos

Hi Brahmareddy,Thanks for your reply. Your second approach is quite close to what I already tried earlier. Your post got me to do some more testing, and although I don't know how to set the sys.path via the init script (it says here and here that it'...

3 kudos

04-08-2025 5:52:24 AM

2 More Replies

by MDV • Databricks Partner

04-08-2025 2:03:41 AM

1079 Views
2 replies
0 kudos

Problem with df.first() or collect() when collation different from UTF8_BINARY

I'm getting a error when I want to select the first() or collect() from a dataframe when using a collation different than UTF8_BINARYExample that reproduces the issue :This works :df_result = spark.sql(f""" SELECT 'en-us' AS ET...

Data Engineering

1079 Views
2 replies
0 kudos

04-08-2025 2:03:41 AM

View Replies

Latest Reply

SP_6721
Honored Contributor II

04-08-2025 2:54:05 AM

0 kudos

Hi @MDV I guess the issue likely comes from how non-default collations like UTF8_LCASE behave during serialization when using first() or collect(). As a workaround wrap the value in a subquery and re-cast the collation back to UTF8_BINARY before acce...

0 kudos

04-08-2025 2:54:05 AM

1 More Replies

by 21f3001806 • New Contributor III

04-07-2025 10:56:04 PM

1309 Views
3 replies
1 kudos

Resolved! Dynamic inference tasks in workflows using dabs

I have some workflows where we use dynamic inference to set task values or capture job executions counts or output rows. Is there any way I can set these dynamic values using the ui but can i do the same at the time of dabs workflow creation. Can you...

Data Engineering

1309 Views
3 replies
1 kudos

04-07-2025 10:56:04 PM

View Replies

Latest Reply

21f3001806
New Contributor III

04-08-2025 4:20:31 AM

1 kudos

Thanks @ashraf1395 , I got the idea of what I was looking for.

1 kudos

04-08-2025 4:20:31 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! Unity Catalog Table in Databricks Asset Bundle

DAB - Common cluster configs possible?

Zorder and Liquid Clustering Performance while reading and writing data

Databricks Job Listener Concept for Tracking Personal Jobs

Resolved! i haven't received my certificate or the badge for Databricks Certified Data Engineer Associate

How to execute SQL statement using terraform

How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio

How can we change from GC to G1GC in serverless

Overwriting delta table takes lot of time

Resolved! Re-Ingest Autoloader files foreachbatch

Can't "run all below" - "command is part of a batch that is still running"

Accessing the regions that are disabled by default in AWS from Databricks. In AWS we have 4 regions that are disabled by default. You must first enabl...

How to make .py files available for import?

Problem with df.first() or collect() when collation different from UTF8_BINARY

Resolved! Dynamic inference tasks in workflows using dabs

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template