Data Engineering

Forum Posts

Sorted by:

Start a conversation

by William_Scardua • Valued Contributor

11-27-2023 3:47:52 PM

3077 Views
3 replies
2 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ?

Data Engineering

dataquality

3077 Views
3 replies
2 kudos

11-27-2023 3:47:52 PM

View Replies

Latest Reply

chanukya-pekala
Contributor II

06-18-2025 2:19:01 AM

2 kudos

DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just p...

2 kudos

06-18-2025 2:19:01 AM

2 More Replies

by Samael • New Contributor II

06-04-2025 7:10:26 AM

779 Views
2 replies
1 kudos

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Hi there,We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that...

Data Engineering

779 Views
2 replies
1 kudos

06-04-2025 7:10:26 AM

View Replies

Latest Reply

Samael
New Contributor II

06-18-2025 12:11:16 AM

1 kudos

Thanks for helping!Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries: SELECT*FROMparquet.`s3://bucket/prefix/partition_column_date=20250616/`We haven't f...

1 kudos

06-18-2025 12:11:16 AM

1 More Replies

by kenmyers-8451 • Contributor

05-19-2025 9:52:38 PM

846 Views
4 replies
0 kudos

dynamically create file path for sql_task

I am trying to make a reusable workflow where I can run a merge script for any number of tables. The idea is I tell the workflow the table name and/or path to it and it can reference that in the file path field. The simplified idea is below: resource...

Data Engineering

846 Views
4 replies
0 kudos

05-19-2025 9:52:38 PM

View Replies

Latest Reply

jtirila
New Contributor II

06-17-2025 11:56:39 PM

0 kudos

Oh, never mind, I got it working. Just using single quotes around the {{ }} part solves it (I guess double quotes would work as well.) I think I tried this yesterday but probably run into another isssue with dashes in task names: https://community.d...

0 kudos

06-17-2025 11:56:39 PM

3 More Replies

by jommo • New Contributor

11-13-2024 8:40:56 AM

3680 Views
2 replies
0 kudos

Exploring Data Quality Frameworks in Databricks

I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.In the past, I’ve worked with Deequ, but I’ve noticed that it’s not as w...

Data Engineering

3680 Views
2 replies
0 kudos

11-13-2024 8:40:56 AM

View Replies

Latest Reply

dataoculus_app
New Contributor III

06-17-2025 11:40:02 PM

0 kudos

GE and other DQ tools will fire lot of SQLs, increasing cost and adding delays. so it depends on whats your requirements are. happy to discuss more if you are interested, as I am also going to make such tool available to databricks community as well ...

0 kudos

06-17-2025 11:40:02 PM

1 More Replies

by AniruddhaGI • New Contributor II

06-17-2025 11:01:39 PM

1783 Views
1 replies
0 kudos

Workspace allows dbf path to install in Databricks 16.4 LTS

Feature: Library installation using requirements.txt on DB Runtime 16.4 LTSAffected Areas: Workspace isolation, Library ManagementSteps to Reproduce:Upload a wheel file to dbfPut the requirements.txt file in the Workspace and put dbfs path in require...

Data Engineering

library

Security

Workspace

1783 Views
1 replies
0 kudos

06-17-2025 11:01:39 PM

View Replies

Latest Reply

AniruddhaGI
New Contributor II

06-17-2025 11:11:19 PM

0 kudos

I would like to know if the workspace isolation is a priority, and only Databricks 14.3 and lower allow installation via DBFS.Why should the requirements.txt allow you to install libraries or packages via dbfs path?Could someone please explain why th...

0 kudos

06-17-2025 11:11:19 PM

by Pedro1 • New Contributor II

05-31-2024 2:44:31 AM

2452 Views
2 replies
0 kudos

databricks_grants fails because it keeps track of a removed principal

Hi all,My terraform script fails on a databricks_grants with the error: "Error: cannot update grants: Could not find principal with name DataUsers". The principal DataUsers does not exist anymore because it has previously been deleted by terraform.Bo...

Data Engineering

2452 Views
2 replies
0 kudos

05-31-2024 2:44:31 AM

View Replies

Latest Reply

wkeifenheim-og
New Contributor II

06-17-2025 8:47:00 PM

0 kudos

I'm here searching for a similar but different issue, so this is just a suggestion of something to try..Have you tried setting a depends_on argument within your databricks_grants block?

0 kudos

06-17-2025 8:47:00 PM

1 More Replies

by pooja_bhumandla • New Contributor II

06-17-2025 1:51:20 AM

404 Views
1 replies
1 kudos

Deletion Vectors on Partioned Tables

Are Deletion Vectors supported for partitioned delta tables in Databricks?

Data Engineering

404 Views
1 replies
1 kudos

06-17-2025 1:51:20 AM

View Replies

Latest Reply

paolajara
Databricks Employee

06-17-2025 10:21:54 AM

1 kudos

Hi @pooja_bhumandla , Yes, deletion vectors are supported for partitioned delta tables in Databricks. It comes as part of storage optimization that allows to delete, update, merge operations to mark existing rows as removed or changed without rewrit...

1 kudos

06-17-2025 10:21:54 AM

by rcostanza • New Contributor III

06-17-2025 9:26:04 AM

776 Views
1 replies
1 kudos

Resolved! DataFrame.localCheckpoint() and cluster autoscaling at odds with each other

I have a notebook where at the beginning I load several dataframes and cache them using localCheckpoint(). I run this notebook using an all-purpose cluster with autoscaling enabled, with a mininum of 1 worker and maximum 2.The cluster often autoscale...

Data Engineering

776 Views
1 replies
1 kudos

06-17-2025 9:26:04 AM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

06-17-2025 9:37:31 AM

1 kudos

Hi @rcostanza You're facing a common issue with autoscaling clusters and cached data locality.There are several approaches to address this:Preventing Downscaling During Execution1. Disable Autoscaling Temporarily- You can disable autoscaling programm...

1 kudos

06-17-2025 9:37:31 AM

by hpant • New Contributor III

03-27-2025 4:43:45 AM

927 Views
2 replies
1 kudos

Is it possible to create external volume using databricks asset bundle?

Is it possible to create external volume using databricks asset bundle? I have this code from databricks.yml file which is working perfectly fine for manged volume: resources: volumes: bronze_checkpoints_volume: catalog_name: ...

Data Engineering

927 Views
2 replies
1 kudos

03-27-2025 4:43:45 AM

View Replies

Latest Reply

nayan_wylde
Honored Contributor III

06-17-2025 8:10:28 AM

1 kudos

bundle:name: my_azure_volume_bundleresources:volumes:my_external_volume:catalog_name: mainschema_name: my_schemaname: my_external_volumevolume_type: EXTERNALstorage_location: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>...

1 kudos

06-17-2025 8:10:28 AM

1 More Replies

by Ivan_Pyrog • New Contributor

01-24-2025 3:18:17 AM

1332 Views
2 replies
0 kudos

Azure Event Hub throws Timeout Exceptio: Timed out waiting for a node assignment. Call: describeTopi

Hello team, We are researching the streaming capabilities of our data platform and currently in need of reading data from EVH ( event hub) with our Databricks notebooks. Unfortunately there seems to be an error somewhere due to Timeout Exception: Tim...

Data Engineering

1332 Views
2 replies
0 kudos

01-24-2025 3:18:17 AM

View Replies

Latest Reply

VZLA
Databricks Employee

06-17-2025 7:00:39 AM

0 kudos

@Ivan_Pyrog what's the full error message as per the Spark Driver log and what is your Kafka Broker version? I suspect you may be actually be hitting an incompatible of client-server.

0 kudos

06-17-2025 7:00:39 AM

1 More Replies

by kwasi • New Contributor II

09-05-2023 12:04:26 PM

19287 Views
10 replies
2 kudos

Kafka timout

Hello, I am trying to read topics from a kafaka stream but I am getting the time out error below.java.util.concurrent.ExecutionException: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeT...

Data Engineering

19287 Views
10 replies
2 kudos

09-05-2023 12:04:26 PM

View Replies

Latest Reply

VZLA
Databricks Employee

06-17-2025 6:58:45 AM

2 kudos

What's your Kafka Broker version and which Kafka client is in use (spark's, python-kafka, kafka-confluent,...) ?

2 kudos

06-17-2025 6:58:45 AM

9 More Replies

by himanshu_k • New Contributor

04-04-2024 9:42:03 AM

6214 Views
3 replies
0 kudos

Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark

Hi community,I hope you're all doing well. I'm currently engaged in a PySpark project where I'm implementing pagination-like functionality using the offset and limit functions. My aim is to retrieve data between a specified starting_index and ending_...

Data Engineering

6214 Views
3 replies
0 kudos

04-04-2024 9:42:03 AM

View Replies

Latest Reply

Mathias_Peters
Contributor II

06-17-2025 6:46:58 AM

0 kudos

Hi, did you find answer to this question? I am having similar problems and a slow solution, which I need to improve upon. Thanks in advance

0 kudos

06-17-2025 6:46:58 AM

2 More Replies

by Reza • New Contributor III

12-15-2021 1:00:55 PM

13117 Views
11 replies
6 kudos

Resolved! How can search in a specific folder in Databricks?

There is a keyword search option in Databricks that searches for a command or word in the entire workspace. How can search for a command in a specific folder or repository?

Data Engineering

13117 Views
11 replies
6 kudos

12-15-2021 1:00:55 PM

View Replies

Latest Reply

Jensz007
New Contributor II

06-17-2025 3:12:19 AM

6 kudos

@AtanuI agree with nelsoncardenas, the problem is not solved, and the answer currently only provides us with saying we need to raise a feature request.Would it be possible to at least link the feature requested by nelsoncardenas to this post/answer? ...

6 kudos

06-17-2025 3:12:19 AM

10 More Replies

by saicharandeepb • New Contributor III

06-16-2025 11:51:35 PM

1676 Views
0 replies
0 kudos

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

Hi everyone,I'm planning to implement Azure Databricks Auto Loader using the Databricks-managed file notification mode for an external location registered in Unity Catalog. I understand this feature is currently in public preview, and I’d love to hea...

Data Engineering

1676 Views
0 replies
0 kudos

06-16-2025 11:51:35 PM

by nayan_wylde • Honored Contributor III

06-13-2025 12:27:34 PM

684 Views
3 replies
0 kudos

Installing Maven in UC enabled Standard mode cluster.

Curios if anyone face the issue of installing Maven packages in UC enabled cluster. Traditionally we use to install maven packages from artifactory repo. I am trying to install the same package from a UC enabled cluster (Standard mode). It worked whe...

Data Engineering

684 Views
3 replies
0 kudos

06-13-2025 12:27:34 PM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

06-13-2025 2:30:43 PM

0 kudos

Hi @nayan_wylde Yes, this is a common challenge when transitioning to Unity Catalog (UC) enabled clusters.The installation of Maven packages from Artifactory repositories does work differently in UC environments,but there are several approaches you c...

0 kudos

06-13-2025 2:30:43 PM

2 More Replies

Databricks Community

Forum Posts

What is the Data Quality Framework do you use/recomend ?

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

dynamically create file path for sql_task

Exploring Data Quality Frameworks in Databricks

Workspace allows dbf path to install in Databricks 16.4 LTS

databricks_grants fails because it keeps track of a removed principal

Deletion Vectors on Partioned Tables

Resolved! DataFrame.localCheckpoint() and cluster autoscaling at odds with each other

Is it possible to create external volume using databricks asset bundle?

Azure Event Hub throws Timeout Exceptio: Timed out waiting for a node assignment. Call: describeTopi

Kafka timout

Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark

Resolved! How can search in a specific folder in Databricks?

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

Installing Maven in UC enabled Standard mode cluster.

Join Us as a Local Community Builder!

Data profiling monitoring with foreign catalog

How to invoke Databricks AI Assistant from a noteb...

Issue with Lakebridge transpile installation – SSL...

Spark JDBC Netsuite error - SQLSyntaxErrorExcepti...

Syncing lakebase table to delta table