cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

michelleliu
by New Contributor III
  • 2823 Views
  • 3 replies
  • 2 kudos

Resolved! DLT Performance Issue

I've been seeing patterns in DLT process time in all my pipelines, as in attached screenshot. Each data point is an "update" that's set to "continuous". The process time keeps increasing until a point and drops back to what it's desired to be. This w...

  • 2823 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 2 kudos

Hi @michelleliu This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:Common Causes1. Memory Pressure & Garbage CollectionProcess...

  • 2 kudos
2 More Replies
alau131
by New Contributor
  • 1451 Views
  • 2 replies
  • 2 kudos

How to dynamically have the parent notebook call on a child notebook?

Hi! I would please like help on how to dynamically call one notebook from another in Databricks and have the parent notebook get the dataframe results from the child notebook. Some background info is that I have a main python notebook and multiple SQ...

  • 1451 Views
  • 2 replies
  • 2 kudos
Latest Reply
jameshughes
Databricks Partner
  • 2 kudos

What you are looking to do is really not the intent of notebooks and you cannot pass complex data types between notebooks. You would need to persist your data frame from the child notebook so your parent notebook could retrieve the results after the ...

  • 2 kudos
1 More Replies
Abel_Martinez
by Contributor
  • 22898 Views
  • 10 replies
  • 10 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

  • 22898 Views
  • 10 replies
  • 10 kudos
Latest Reply
ravisharma1024
New Contributor II
  • 10 kudos

I was facing the same issue, now It is resolved, and thanks to @Abel_Martinez.I am using this like below code:df = spark.read.format("mongodb") \.option('spark.mongodb.read.connection.uri', "mongodb+srv://*****:*****@******/?retryWrites=true&w=majori...

  • 10 kudos
9 More Replies
vanverne
by New Contributor II
  • 3025 Views
  • 3 replies
  • 1 kudos

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Hello,I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. Here is a simplified version of my current workflow:I create a temporary...

  • 3025 Views
  • 3 replies
  • 1 kudos
Latest Reply
vanverne
New Contributor II
  • 1 kudos

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon...

  • 1 kudos
2 More Replies
Yannic
by New Contributor
  • 1180 Views
  • 1 replies
  • 0 kudos

Delete a directory in DBFS recursively from Azure

I have an Azure storage mounted to DBFS. I want to delete a directory inside recursively. I tried both,dbutils.fs.rm(f"/mnt/data/to/delete", True)and%fs rm -r /mnt/data/to/delete In both cases I get the following exception:AzureException: hadoop_azur...

  • 1180 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Yannic Azure Blob Storage doesn't have true directories - it simulates them through blob naming conventions,which can cause issues with recursive deletion operations.Try below one. Delete Files First, Then Directorydef delete_directory_recursive(...

  • 0 kudos
Sainath368
by Contributor
  • 1161 Views
  • 1 replies
  • 0 kudos

Data Skipping- Partitioned tables

Hi all,I have a question- how can we modify delta.dataSkippingStatsColumns and compute statistics for a partitioned delta table in Databricks? I want to understand the process and best practices for changing this setting and ensuring accurate statist...

  • 1161 Views
  • 1 replies
  • 0 kudos
Latest Reply
paolajara
Databricks Employee
  • 0 kudos

Hi, delta.dataSkippingStatsColumns specifies a coma-separated list of column names used by Delta Lake to collect statistics. It will improve the performance by skipping those columns since it will supersede the default behavior of analyzing the first...

  • 0 kudos
GeKo
by Contributor
  • 5712 Views
  • 8 replies
  • 4 kudos

Resolved! how to specify the runtime version for serverless job

Hello,if I understood correctly.... using a serverless cluster comes always with the latest runtime version, by default.Now I need to stick to e.g. runtime version 15.4 for a certain job, which gets deployed via asset bundles. How do I specify/config...

Data Engineering
assetbundle
serverless
  • 5712 Views
  • 8 replies
  • 4 kudos
Latest Reply
GeKo
Contributor
  • 4 kudos

  • 4 kudos
7 More Replies
Avinash_Narala
by Databricks Partner
  • 4452 Views
  • 9 replies
  • 1 kudos

Redshift Stored Procedure Migration to Databricks

Hi,I want to migrate Redshift SQL Stored Procedures to databricks.As databricks doesn't support the concept of SQL Stored Procedures. How can I do so?

  • 4452 Views
  • 9 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

Databricks docs shows procedures are in public preview and requires runtime 17.0 and above.https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-procedure

  • 1 kudos
8 More Replies
JothyGanesan
by New Contributor III
  • 3071 Views
  • 4 replies
  • 1 kudos

Resolved! Streaming data - Merge in Target - DLT

We have streaming inputs coming from streaming tables and also the table from apply_changes.In our target there is only one table which needs to be merged with all the sources. Each source provides different columns in our target table. Challenge: Ev...

  • 3071 Views
  • 4 replies
  • 1 kudos
Latest Reply
vd1
New Contributor II
  • 1 kudos

This can cause concurrent writes issues? Updating same table from multiple streams?

  • 1 kudos
3 More Replies
RevathiTiger
by Databricks Partner
  • 5498 Views
  • 3 replies
  • 1 kudos

Expectations vs Great expectations with Databricks DLT pipelines

Hi All,We are working on creating a DQ framework on DLT pipelines in Databricks. Databricks DLT pipelines reads incoming data from Kafka / Files sources. once data is ingested Data validation must happen on top of the ingested data. Customer is evalu...

  • 5498 Views
  • 3 replies
  • 1 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 1 kudos

If you have decided to use DLT, it handles micro batching and checkpointing for you. But typically, we can take more control, if you rewrite the logic using Autoloader or Structured Streaming by custom checkpointing the file directory and maintain yo...

  • 1 kudos
2 More Replies
lmu
by Databricks Partner
  • 4052 Views
  • 11 replies
  • 3 kudos

Resolved! Write on External Table with Row Level Security fails

Hey,we are experiencing issues with writing to external tables when using the Unity Catalogue and Row Level Security.As soon as we stop using the serverless compute instance, we receive the following error for writing (Overwrite, append and upsert):E...

  • 4052 Views
  • 11 replies
  • 3 kudos
Latest Reply
lmu
Databricks Partner
  • 3 kudos

After further testing, it was found that the dedicated access mode (formerly single user) either does not work or exhibits strange behaviour. In one scenario, the 16.4 cluster with dedicated access mode could write in append mode but not overwrite, a...

  • 3 kudos
10 More Replies
William_Scardua
by Valued Contributor
  • 4767 Views
  • 3 replies
  • 2 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 4767 Views
  • 3 replies
  • 2 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 2 kudos

DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just p...

  • 2 kudos
2 More Replies
Samael
by New Contributor II
  • 1415 Views
  • 2 replies
  • 1 kudos

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Hi there,We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that...

  • 1415 Views
  • 2 replies
  • 1 kudos
Latest Reply
Samael
New Contributor II
  • 1 kudos

Thanks for helping!Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries: SELECT*FROMparquet.`s3://bucket/prefix/partition_column_date=20250616/`We haven't f...

  • 1 kudos
1 More Replies
kenmyers-8451
by Contributor II
  • 2078 Views
  • 4 replies
  • 0 kudos

dynamically create file path for sql_task

I am trying to make a reusable workflow where I can run a merge script for any number of tables. The idea is I tell the workflow the table name and/or path to it and it can reference that in the file path field. The simplified idea is below: resource...

  • 2078 Views
  • 4 replies
  • 0 kudos
Latest Reply
jtirila
New Contributor II
  • 0 kudos

Oh, never mind, I got it working. Just using single quotes around the {{  }} part solves it (I guess double quotes would work as well.) I think I tried this yesterday but probably run into another isssue with dashes in task names: https://community.d...

  • 0 kudos
3 More Replies
jommo
by New Contributor
  • 5613 Views
  • 2 replies
  • 0 kudos

Exploring Data Quality Frameworks in Databricks

I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.In the past, I’ve worked with Deequ, but I’ve noticed that it’s not as w...

  • 5613 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataoculus_app
New Contributor III
  • 0 kudos

GE and other DQ tools will fire lot of SQLs, increasing cost and adding delays. so it depends on whats your requirements are. happy to discuss more if you are interested, as I am also going to make such tool available to databricks community as well ...

  • 0 kudos
1 More Replies
Labels