cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

William_Scardua
by Valued Contributor
  • 3077 Views
  • 3 replies
  • 2 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 3077 Views
  • 3 replies
  • 2 kudos
Latest Reply
chanukya-pekala
Contributor II
  • 2 kudos

DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just p...

  • 2 kudos
2 More Replies
Samael
by New Contributor II
  • 779 Views
  • 2 replies
  • 1 kudos

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Hi there,We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that...

  • 779 Views
  • 2 replies
  • 1 kudos
Latest Reply
Samael
New Contributor II
  • 1 kudos

Thanks for helping!Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries: SELECT*FROMparquet.`s3://bucket/prefix/partition_column_date=20250616/`We haven't f...

  • 1 kudos
1 More Replies
kenmyers-8451
by Contributor
  • 846 Views
  • 4 replies
  • 0 kudos

dynamically create file path for sql_task

I am trying to make a reusable workflow where I can run a merge script for any number of tables. The idea is I tell the workflow the table name and/or path to it and it can reference that in the file path field. The simplified idea is below: resource...

  • 846 Views
  • 4 replies
  • 0 kudos
Latest Reply
jtirila
New Contributor II
  • 0 kudos

Oh, never mind, I got it working. Just using single quotes around the {{  }} part solves it (I guess double quotes would work as well.) I think I tried this yesterday but probably run into another isssue with dashes in task names: https://community.d...

  • 0 kudos
3 More Replies
jommo
by New Contributor
  • 3680 Views
  • 2 replies
  • 0 kudos

Exploring Data Quality Frameworks in Databricks

I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.In the past, I’ve worked with Deequ, but I’ve noticed that it’s not as w...

  • 3680 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataoculus_app
New Contributor III
  • 0 kudos

GE and other DQ tools will fire lot of SQLs, increasing cost and adding delays. so it depends on whats your requirements are. happy to discuss more if you are interested, as I am also going to make such tool available to databricks community as well ...

  • 0 kudos
1 More Replies
AniruddhaGI
by New Contributor II
  • 1783 Views
  • 1 replies
  • 0 kudos

Workspace allows dbf path to install in Databricks 16.4 LTS

Feature: Library installation using requirements.txt on DB Runtime 16.4 LTSAffected Areas: Workspace isolation, Library ManagementSteps to Reproduce:Upload a wheel file to dbfPut the requirements.txt file in the Workspace and put dbfs path in require...

Data Engineering
library
Security
Workspace
  • 1783 Views
  • 1 replies
  • 0 kudos
Latest Reply
AniruddhaGI
New Contributor II
  • 0 kudos

I would like to know if the workspace isolation is a priority, and only Databricks 14.3 and lower allow installation via DBFS.Why should the requirements.txt allow you to install libraries or packages via dbfs path?Could someone please explain why th...

  • 0 kudos
Pedro1
by New Contributor II
  • 2452 Views
  • 2 replies
  • 0 kudos

databricks_grants fails because it keeps track of a removed principal

Hi all,My terraform script fails on a databricks_grants with the error: "Error: cannot update grants: Could not find principal with name DataUsers". The principal DataUsers does not exist anymore because it has previously been deleted by terraform.Bo...

  • 2452 Views
  • 2 replies
  • 0 kudos
Latest Reply
wkeifenheim-og
New Contributor II
  • 0 kudos

I'm here searching for a similar but different issue, so this is just a suggestion of something to try..Have you tried setting a depends_on argument within your databricks_grants block?

  • 0 kudos
1 More Replies
pooja_bhumandla
by New Contributor II
  • 404 Views
  • 1 replies
  • 1 kudos

Deletion Vectors on Partioned Tables

Are Deletion Vectors supported for partitioned delta tables in Databricks?

  • 404 Views
  • 1 replies
  • 1 kudos
Latest Reply
paolajara
Databricks Employee
  • 1 kudos

Hi @pooja_bhumandla , Yes, deletion vectors are supported for partitioned delta tables in Databricks. It comes as part of storage optimization that allows  to delete, update, merge operations to mark existing rows as removed or changed without rewrit...

  • 1 kudos
rcostanza
by New Contributor III
  • 776 Views
  • 1 replies
  • 1 kudos

Resolved! DataFrame.localCheckpoint() and cluster autoscaling at odds with each other

I have a notebook where at the beginning I load several dataframes and cache them using localCheckpoint(). I run this notebook using an all-purpose cluster with autoscaling enabled, with a mininum of 1 worker and maximum 2.The cluster often autoscale...

  • 776 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @rcostanza You're facing a common issue with autoscaling clusters and cached data locality.There are several approaches to address this:Preventing Downscaling During Execution1. Disable Autoscaling Temporarily- You can disable autoscaling programm...

  • 1 kudos
hpant
by New Contributor III
  • 927 Views
  • 2 replies
  • 1 kudos

Is it possible to create external volume using databricks asset bundle?

Is it possible to create external volume using databricks asset bundle? I have this code from databricks.yml file which is working perfectly fine for manged volume:    resources:      volumes:        bronze_checkpoints_volume:          catalog_name: ...

  • 927 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 1 kudos

bundle:name: my_azure_volume_bundleresources:volumes:my_external_volume:catalog_name: mainschema_name: my_schemaname: my_external_volumevolume_type: EXTERNALstorage_location: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>...

  • 1 kudos
1 More Replies
Ivan_Pyrog
by New Contributor
  • 1332 Views
  • 2 replies
  • 0 kudos

Azure Event Hub throws Timeout Exceptio: Timed out waiting for a node assignment. Call: describeTopi

Hello team, We are researching the streaming capabilities of our data platform and currently in need of reading data from EVH ( event hub) with our Databricks notebooks. Unfortunately there seems to be an error somewhere due to Timeout Exception: Tim...

  • 1332 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@Ivan_Pyrog what's the full error message as per the Spark Driver log and what is your Kafka Broker version? I suspect you may be actually be hitting an incompatible of client-server.

  • 0 kudos
1 More Replies
kwasi
by New Contributor II
  • 19287 Views
  • 10 replies
  • 2 kudos

Kafka timout

Hello, I am trying to read topics from a kafaka stream but I am getting the time out error below.java.util.concurrent.ExecutionException: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeT...

  • 19287 Views
  • 10 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

What's your Kafka Broker version and which Kafka client is in use (spark's, python-kafka, kafka-confluent,...) ?

  • 2 kudos
9 More Replies
himanshu_k
by New Contributor
  • 6214 Views
  • 3 replies
  • 0 kudos

Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark

Hi community,I hope you're all doing well. I'm currently engaged in a PySpark project where I'm implementing pagination-like functionality using the offset and limit functions. My aim is to retrieve data between a specified starting_index and ending_...

  • 6214 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 0 kudos

Hi, did you find answer to this question? I am having similar problems and a slow solution, which I need to improve upon. Thanks in advance

  • 0 kudos
2 More Replies
Reza
by New Contributor III
  • 13117 Views
  • 11 replies
  • 6 kudos

Resolved! How can search in a specific folder in Databricks?

There is a keyword search option in Databricks that searches for a command or word in the entire workspace. How can search for a command in a specific folder or repository?

  • 13117 Views
  • 11 replies
  • 6 kudos
Latest Reply
Jensz007
New Contributor II
  • 6 kudos

@AtanuI agree with nelsoncardenas, the problem is not solved, and the answer currently only provides us with saying we need to raise a feature request.Would it be possible to at least link the feature requested by nelsoncardenas to this post/answer? ...

  • 6 kudos
10 More Replies
saicharandeepb
by New Contributor III
  • 1676 Views
  • 0 replies
  • 0 kudos

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

Hi everyone,I'm planning to implement Azure Databricks Auto Loader using the Databricks-managed file notification mode for an external location registered in Unity Catalog. I understand this feature is currently in public preview, and I’d love to hea...

  • 1676 Views
  • 0 replies
  • 0 kudos
nayan_wylde
by Honored Contributor III
  • 684 Views
  • 3 replies
  • 0 kudos

Installing Maven in UC enabled Standard mode cluster.

Curios if anyone face the issue of installing Maven packages in UC enabled cluster. Traditionally we use to install maven packages from artifactory repo. I am trying to install the same package from a UC enabled cluster (Standard mode). It worked whe...

  • 684 Views
  • 3 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @nayan_wylde Yes, this is a common challenge when transitioning to Unity Catalog (UC) enabled clusters.The installation of Maven packages from Artifactory repositories does work differently in UC environments,but there are several approaches you c...

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels