Data Engineering

Forum Posts

Sorted by:

by Alex79 • New Contributor II

06-18-2025 8:34:53 AM

2423 Views
2 replies
0 kudos

Get Job Run output through Rest API call

I have a simple notebook reading a dataframe as input and returning another dataframe, which is as follows:from pyspark.sql import SparkSessionimport pandas as pd, jsonspark = SparkSession.builder \ .appName("Pandas to Spark DataFrame Conversion")...

Data Engineering

2423 Views
2 replies
0 kudos

06-18-2025 8:34:53 AM

View Replies

Latest Reply

Vidhi_Khaitan
Databricks Employee

06-18-2025 11:45:12 PM

0 kudos

Hi team,{"error_code": "INVALID_PARAMETER_VALUE","message": "Retrieving the output of runs with multiple tasks is not supported..."}means the job you're triggering (job_id = 'my_job_id') is a multi-task job (even if it has only one task). In such cas...

0 kudos

06-18-2025 11:45:12 PM

1 More Replies

by cool_cool_cool • New Contributor II

10-21-2024 1:20:27 AM

3128 Views
3 replies
0 kudos

Databricks Workflow is stuck on the first task and doesnt do anyworkload

Heya I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30...

Data Engineering

3128 Views
3 replies
0 kudos

10-21-2024 1:20:27 AM

View Replies

Latest Reply

Sri_M
New Contributor II

05-19-2025 3:21:06 AM

0 kudos

@cool_cool_cool I am facing same issue as well.Is this issue resolved for you? If yes, can you please let me know what action have you taken?

0 kudos

05-19-2025 3:21:06 AM

2 More Replies

by lorenz • New Contributor III

06-28-2023 7:21:26 AM

14901 Views
8 replies
3 kudos

Resolved! Databricks approaches to CDC

I'm interested in learning more about Change Data Capture (CDC) approaches with Databricks. Can anyone provide insights on the best practices and recommendations for utilizing CDC effectively in Databricks? Are there any specific connectors or tools ...

Data Engineering

14901 Views
8 replies
3 kudos

06-28-2023 7:21:26 AM

View Replies

Latest Reply

Deekay
New Contributor II

06-18-2025 11:37:04 AM

3 kudos

Hi @jcozar ,Thank you so much for your response I have some queries, it will be really helpful if you can share your thoughts.How are you segregating the tables from raw to bronze? Suppose Debezium is capturing CDCs from 100 tables, all changes are ...

3 kudos

06-18-2025 11:37:04 AM

7 More Replies

by lezwon • Contributor

06-16-2025 10:30:00 PM

1444 Views
2 replies
3 kudos

Resolved! Install custom wheel from dbfs in serverless enviroment

Hey folks,I have a job that runs on a serverless compute. I have also created a wheel file with custom functions, which I require in this job. I see that from here, we cannot install libraries for a task and must use notebook-scoped libraries. So wha...

Data Engineering

1444 Views
2 replies
3 kudos

06-16-2025 10:30:00 PM

View Replies

Latest Reply

loui_wentzel
Databricks Partner

06-18-2025 1:48:48 PM

3 kudos

Is your dbfs mounted?Otherwise, try uploading it to your workspace's "shared" folder - this is a common place to put these sorts of files. dbfs is slowly getting phased out and not really in any best practices.

3 kudos

06-18-2025 1:48:48 PM

1 More Replies

by pooja_bhumandla • Databricks Partner

06-18-2025 10:01:03 AM

912 Views
3 replies
0 kudos

Auto tuning of file size

Why maxFileSize and minFileSize are different from targetFileSize after optimization? What is the significance of targetFileSize? "numRemovedFiles": "2099","numRemovedBytes": "29658974681","p25FileSize": "29701688","numDeletionVectorsRemoved": "0","m...

Data Engineering

912 Views
3 replies
0 kudos

06-18-2025 10:01:03 AM

View Replies

Latest Reply

loui_wentzel
Databricks Partner

06-18-2025 11:46:47 PM

0 kudos

there could be several different reasons, but mainly, it's because grouping arbitrary data into some target file-size is well... arbitrary.Imagine I gave you a large container of sand and some emtpy buckets, and asked you to move the sand from the co...

0 kudos

06-18-2025 11:46:47 PM

2 More Replies

by SreedharVengala • New Contributor III

07-26-2021 6:55:55 PM

31718 Views
11 replies
7 kudos

PGP Encryption / Decryption in Databricks

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault. What libraries need to be used? Any code snippets? Links?

Data Engineering

31718 Views
11 replies
7 kudos

07-26-2021 6:55:55 PM

View Replies

Latest Reply

Junpei_Liang
New Contributor II

06-18-2025 6:12:06 PM

7 kudos

anyone has update on this?

7 kudos

06-18-2025 6:12:06 PM

10 More Replies

by Ramki • New Contributor

06-18-2025 2:35:46 PM

448 Views
1 replies
0 kudos

Lakeflow clarification

Are there options to modify the streaming table after it has been created by the Lakeflow pipeline? In the use case I'm trying to solve, I need to add delta.enableIcebergCompatV2 and delta.universalFormat.enabledFormats to the target streaming table....

Data Engineering

448 Views
1 replies
0 kudos

06-18-2025 2:35:46 PM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-18-2025 4:40:32 PM

0 kudos

Hi @Ramki Yes, you can modify a streaming table created by a LakeFlow pipeline, especially when the pipeline is in triggered mode (not running continuously).In your case, you want to add the following Delta table properties: TBLPROPERTIES ( 'delta....

0 kudos

06-18-2025 4:40:32 PM

by michelleliu • New Contributor III

06-17-2025 10:53:06 AM

2449 Views
3 replies
2 kudos

Resolved! DLT Performance Issue

I've been seeing patterns in DLT process time in all my pipelines, as in attached screenshot. Each data point is an "update" that's set to "continuous". The process time keeps increasing until a point and drops back to what it's desired to be. This w...

Data Engineering

2449 Views
3 replies
2 kudos

06-17-2025 10:53:06 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-17-2025 5:36:47 PM

2 kudos

Hi @michelleliu This sawtooth pattern in DLT processing times is actually quite common and typically indicates one of several underlying issues. Here are the most likely causes and solutions:Common Causes1. Memory Pressure & Garbage CollectionProcess...

2 kudos

06-17-2025 5:36:47 PM

2 More Replies

by alau131 • New Contributor

06-18-2025 10:10:14 AM

1195 Views
2 replies
2 kudos

How to dynamically have the parent notebook call on a child notebook?

Hi! I would please like help on how to dynamically call one notebook from another in Databricks and have the parent notebook get the dataframe results from the child notebook. Some background info is that I have a main python notebook and multiple SQ...

Data Engineering

1195 Views
2 replies
2 kudos

06-18-2025 10:10:14 AM

View Replies

Latest Reply

jameshughes
Databricks Partner

06-18-2025 1:40:51 PM

2 kudos

What you are looking to do is really not the intent of notebooks and you cannot pass complex data types between notebooks. You would need to persist your data frame from the child notebook so your parent notebook could retrieve the results after the ...

2 kudos

06-18-2025 1:40:51 PM

1 More Replies

by Abel_Martinez • Contributor

12-23-2022 7:44:50 AM

22096 Views
10 replies
10 kudos

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

I'm able to connect to MongoDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 and this code:df = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("uri", jdbcUrl)It works well, but if I install last MongoDB Spark Connector ve...

Data Engineering

22096 Views
10 replies
10 kudos

12-23-2022 7:44:50 AM

View Replies

Latest Reply

ravisharma1024
New Contributor II

10-23-2024 5:20:57 AM

10 kudos

I was facing the same issue, now It is resolved, and thanks to @Abel_Martinez.I am using this like below code:df = spark.read.format("mongodb") \.option('spark.mongodb.read.connection.uri', "mongodb+srv://*****:*****@******/?retryWrites=true&w=majori...

10 kudos

10-23-2024 5:20:57 AM

9 More Replies

by vanverne • New Contributor II

11-24-2024 7:15:34 PM

2707 Views
3 replies
1 kudos

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Hello,I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. Here is a simplified version of my current workflow:I create a temporary...

Data Engineering

2707 Views
3 replies
1 kudos

11-24-2024 7:15:34 PM

View Replies

Latest Reply

vanverne
New Contributor II

12-05-2024 2:09:50 PM

1 kudos

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon...

1 kudos

12-05-2024 2:09:50 PM

2 More Replies

by Yannic • New Contributor

06-18-2025 4:22:27 AM

994 Views
1 replies
0 kudos

Delete a directory in DBFS recursively from Azure

I have an Azure storage mounted to DBFS. I want to delete a directory inside recursively. I tried both,dbutils.fs.rm(f"/mnt/data/to/delete", True)and%fs rm -r /mnt/data/to/delete In both cases I get the following exception:AzureException: hadoop_azur...

Data Engineering

994 Views
1 replies
0 kudos

06-18-2025 4:22:27 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-18-2025 10:01:00 AM

0 kudos

Hi @Yannic Azure Blob Storage doesn't have true directories - it simulates them through blob naming conventions,which can cause issues with recursive deletion operations.Try below one. Delete Files First, Then Directorydef delete_directory_recursive(...

0 kudos

06-18-2025 10:01:00 AM

by Sainath368 • Contributor

06-13-2025 8:28:57 AM

954 Views
1 replies
0 kudos

Data Skipping- Partitioned tables

Hi all,I have a question- how can we modify delta.dataSkippingStatsColumns and compute statistics for a partitioned delta table in Databricks? I want to understand the process and best practices for changing this setting and ensuring accurate statist...

Data Engineering

954 Views
1 replies
0 kudos

06-13-2025 8:28:57 AM

View Replies

Latest Reply

paolajara
Databricks Employee

06-18-2025 9:16:07 AM

0 kudos

Hi, delta.dataSkippingStatsColumns specifies a coma-separated list of column names used by Delta Lake to collect statistics. It will improve the performance by skipping those columns since it will supersede the default behavior of analyzing the first...

0 kudos

06-18-2025 9:16:07 AM

by GeKo • Contributor

06-17-2025 8:16:48 AM

4889 Views
8 replies
4 kudos

Resolved! how to specify the runtime version for serverless job

Hello,if I understood correctly.... using a serverless cluster comes always with the latest runtime version, by default.Now I need to stick to e.g. runtime version 15.4 for a certain job, which gets deployed via asset bundles. How do I specify/config...

Data Engineering

assetbundle

serverless

4889 Views
8 replies
4 kudos

06-17-2025 8:16:48 AM

View Replies

Latest Reply

GeKo
Contributor

06-18-2025 8:35:47 AM

4 kudos

4 kudos

06-18-2025 8:35:47 AM

7 More Replies

by Avinash_Narala • Databricks Partner

01-29-2025 11:59:10 PM

4060 Views
9 replies
1 kudos

Redshift Stored Procedure Migration to Databricks

Hi,I want to migrate Redshift SQL Stored Procedures to databricks.As databricks doesn't support the concept of SQL Stored Procedures. How can I do so?

Data Engineering

4060 Views
9 replies
1 kudos

01-29-2025 11:59:10 PM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

06-18-2025 6:05:23 AM

1 kudos

Databricks docs shows procedures are in public preview and requires runtime 17.0 and above.https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-procedure

1 kudos

06-18-2025 6:05:23 AM

8 More Replies

Databricks Community

Forum Posts

Get Job Run output through Rest API call

Databricks Workflow is stuck on the first task and doesnt do anyworkload

Resolved! Databricks approaches to CDC

Resolved! Install custom wheel from dbfs in serverless enviroment

Auto tuning of file size

PGP Encryption / Decryption in Databricks

Lakeflow clarification

Resolved! DLT Performance Issue

How to dynamically have the parent notebook call on a child notebook?

Resolved! Why I'm getting connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x from Databricks

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Delete a directory in DBFS recursively from Azure

Data Skipping- Partitioned tables

Resolved! how to specify the runtime version for serverless job

Redshift Stored Procedure Migration to Databricks

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template