cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GeKo
by Contributor
  • 3197 Views
  • 8 replies
  • 4 kudos

Resolved! how to specify the runtime version for serverless job

Hello,if I understood correctly.... using a serverless cluster comes always with the latest runtime version, by default.Now I need to stick to e.g. runtime version 15.4 for a certain job, which gets deployed via asset bundles. How do I specify/config...

Data Engineering
assetbundle
serverless
  • 3197 Views
  • 8 replies
  • 4 kudos
Latest Reply
GeKo
Contributor
  • 4 kudos

  • 4 kudos
7 More Replies
Avinash_Narala
by Valued Contributor II
  • 3544 Views
  • 9 replies
  • 1 kudos

Redshift Stored Procedure Migration to Databricks

Hi,I want to migrate Redshift SQL Stored Procedures to databricks.As databricks doesn't support the concept of SQL Stored Procedures. How can I do so?

  • 3544 Views
  • 9 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Databricks docs shows procedures are in public preview and requires runtime 17.0 and above.https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-procedure

  • 1 kudos
8 More Replies
JothyGanesan
by New Contributor III
  • 2123 Views
  • 4 replies
  • 1 kudos

Resolved! Streaming data - Merge in Target - DLT

We have streaming inputs coming from streaming tables and also the table from apply_changes.In our target there is only one table which needs to be merged with all the sources. Each source provides different columns in our target table. Challenge: Ev...

  • 2123 Views
  • 4 replies
  • 1 kudos
Latest Reply
vd1
New Contributor II
  • 1 kudos

This can cause concurrent writes issues? Updating same table from multiple streams?

  • 1 kudos
3 More Replies
RevathiTiger
by New Contributor II
  • 4425 Views
  • 3 replies
  • 1 kudos

Expectations vs Great expectations with Databricks DLT pipelines

Hi All,We are working on creating a DQ framework on DLT pipelines in Databricks. Databricks DLT pipelines reads incoming data from Kafka / Files sources. once data is ingested Data validation must happen on top of the ingested data. Customer is evalu...

  • 4425 Views
  • 3 replies
  • 1 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 1 kudos

If you have decided to use DLT, it handles micro batching and checkpointing for you. But typically, we can take more control, if you rewrite the logic using Autoloader or Structured Streaming by custom checkpointing the file directory and maintain yo...

  • 1 kudos
2 More Replies
lmu
by New Contributor III
  • 2490 Views
  • 11 replies
  • 3 kudos

Resolved! Write on External Table with Row Level Security fails

Hey,we are experiencing issues with writing to external tables when using the Unity Catalogue and Row Level Security.As soon as we stop using the serverless compute instance, we receive the following error for writing (Overwrite, append and upsert):E...

  • 2490 Views
  • 11 replies
  • 3 kudos
Latest Reply
lmu
New Contributor III
  • 3 kudos

After further testing, it was found that the dedicated access mode (formerly single user) either does not work or exhibits strange behaviour. In one scenario, the 16.4 cluster with dedicated access mode could write in append mode but not overwrite, a...

  • 3 kudos
10 More Replies
William_Scardua
by Valued Contributor
  • 3524 Views
  • 3 replies
  • 2 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 3524 Views
  • 3 replies
  • 2 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 2 kudos

DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just p...

  • 2 kudos
2 More Replies
Samael
by New Contributor II
  • 968 Views
  • 2 replies
  • 1 kudos

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Hi there,We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that...

  • 968 Views
  • 2 replies
  • 1 kudos
Latest Reply
Samael
New Contributor II
  • 1 kudos

Thanks for helping!Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries: SELECT*FROMparquet.`s3://bucket/prefix/partition_column_date=20250616/`We haven't f...

  • 1 kudos
1 More Replies
kenmyers-8451
by Contributor
  • 1088 Views
  • 4 replies
  • 0 kudos

dynamically create file path for sql_task

I am trying to make a reusable workflow where I can run a merge script for any number of tables. The idea is I tell the workflow the table name and/or path to it and it can reference that in the file path field. The simplified idea is below: resource...

  • 1088 Views
  • 4 replies
  • 0 kudos
Latest Reply
jtirila
New Contributor II
  • 0 kudos

Oh, never mind, I got it working. Just using single quotes around the {{  }} part solves it (I guess double quotes would work as well.) I think I tried this yesterday but probably run into another isssue with dashes in task names: https://community.d...

  • 0 kudos
3 More Replies
jommo
by New Contributor
  • 4219 Views
  • 2 replies
  • 0 kudos

Exploring Data Quality Frameworks in Databricks

I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.In the past, I’ve worked with Deequ, but I’ve noticed that it’s not as w...

  • 4219 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataoculus_app
New Contributor III
  • 0 kudos

GE and other DQ tools will fire lot of SQLs, increasing cost and adding delays. so it depends on whats your requirements are. happy to discuss more if you are interested, as I am also going to make such tool available to databricks community as well ...

  • 0 kudos
1 More Replies
Pedro1
by New Contributor II
  • 2760 Views
  • 2 replies
  • 0 kudos

databricks_grants fails because it keeps track of a removed principal

Hi all,My terraform script fails on a databricks_grants with the error: "Error: cannot update grants: Could not find principal with name DataUsers". The principal DataUsers does not exist anymore because it has previously been deleted by terraform.Bo...

  • 2760 Views
  • 2 replies
  • 0 kudos
Latest Reply
wkeifenheim-og
New Contributor II
  • 0 kudos

I'm here searching for a similar but different issue, so this is just a suggestion of something to try..Have you tried setting a depends_on argument within your databricks_grants block?

  • 0 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 525 Views
  • 1 replies
  • 1 kudos

Deletion Vectors on Partioned Tables

Are Deletion Vectors supported for partitioned delta tables in Databricks?

  • 525 Views
  • 1 replies
  • 1 kudos
Latest Reply
paolajara
Databricks Employee
  • 1 kudos

Hi @pooja_bhumandla , Yes, deletion vectors are supported for partitioned delta tables in Databricks. It comes as part of storage optimization that allows  to delete, update, merge operations to mark existing rows as removed or changed without rewrit...

  • 1 kudos
rcostanza
by New Contributor III
  • 1046 Views
  • 1 replies
  • 1 kudos

Resolved! DataFrame.localCheckpoint() and cluster autoscaling at odds with each other

I have a notebook where at the beginning I load several dataframes and cache them using localCheckpoint(). I run this notebook using an all-purpose cluster with autoscaling enabled, with a mininum of 1 worker and maximum 2.The cluster often autoscale...

  • 1046 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @rcostanza You're facing a common issue with autoscaling clusters and cached data locality.There are several approaches to address this:Preventing Downscaling During Execution1. Disable Autoscaling Temporarily- You can disable autoscaling programm...

  • 1 kudos
hpant
by New Contributor III
  • 1105 Views
  • 2 replies
  • 1 kudos

Is it possible to create external volume using databricks asset bundle?

Is it possible to create external volume using databricks asset bundle? I have this code from databricks.yml file which is working perfectly fine for manged volume:    resources:      volumes:        bronze_checkpoints_volume:          catalog_name: ...

  • 1105 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

bundle:name: my_azure_volume_bundleresources:volumes:my_external_volume:catalog_name: mainschema_name: my_schemaname: my_external_volumevolume_type: EXTERNALstorage_location: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>...

  • 1 kudos
1 More Replies
Ivan_Pyrog
by New Contributor
  • 1638 Views
  • 2 replies
  • 0 kudos

Azure Event Hub throws Timeout Exceptio: Timed out waiting for a node assignment. Call: describeTopi

Hello team, We are researching the streaming capabilities of our data platform and currently in need of reading data from EVH ( event hub) with our Databricks notebooks. Unfortunately there seems to be an error somewhere due to Timeout Exception: Tim...

  • 1638 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@Ivan_Pyrog what's the full error message as per the Spark Driver log and what is your Kafka Broker version? I suspect you may be actually be hitting an incompatible of client-server.

  • 0 kudos
1 More Replies
kwasi
by New Contributor II
  • 20951 Views
  • 10 replies
  • 2 kudos

Kafka timout

Hello, I am trying to read topics from a kafaka stream but I am getting the time out error below.java.util.concurrent.ExecutionException: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeT...

  • 20951 Views
  • 10 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

What's your Kafka Broker version and which Kafka client is in use (spark's, python-kafka, kafka-confluent,...) ?

  • 2 kudos
9 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels