Data Engineering

Forum Posts

Sorted by:

Start a conversation

by NancyX • New Contributor II

06-23-2025 6:33:40 AM

870 Views
2 replies
0 kudos

How to pass Dynamic parameters like job.run_id to a pipeline_task in Databricks workflow job?

Is it possible to pass dynamic parameters, such as job.run_id to a pipeline_task within a Databricks Workflow job?

Data Engineering

870 Views
2 replies
0 kudos

06-23-2025 6:33:40 AM

View Replies

Latest Reply

dataminion01
New Contributor II

06-23-2025 11:40:13 AM

0 kudos

yes. it's in the Parameters section

0 kudos

06-23-2025 11:40:13 AM

1 More Replies

by syazwansuhaimi • New Contributor

03-25-2025 8:22:35 PM

3207 Views
1 replies
0 kudos

Massive increase in the number of "GetBlobProperties" operations

I had a massive increase in the volume of "GetBlobProperties" operations in my Azure Blob Storage account. The storage logs indicate that all the extra operations have IPs attributed to my Databricks resource group. I haven't made any changes to my r...

Data Engineering

3207 Views
1 replies
0 kudos

03-25-2025 8:22:35 PM

View Replies

Latest Reply

Vidhi_Khaitan
Databricks Employee

06-23-2025 7:48:36 PM

0 kudos

Massive increase in "GetBlobProperties" operations in your Azure Blob Storage account could be due to the following1. Delta Tables and _delta_log Metadata AccessIf you're using Delta Lake, Databricks reads blob properties (e.g., last-modified time, s...

0 kudos

06-23-2025 7:48:36 PM

by GregTyndall • New Contributor II

12-03-2024 5:43:50 AM

5044 Views
9 replies
5 kudos

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?"incrementalization_issues": [{"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MO...

Data Engineering

5044 Views
9 replies
5 kudos

12-03-2024 5:43:50 AM

View Replies

Latest Reply

_DatabricksUser
New Contributor III

05-14-2025 8:56:06 AM

5 kudos

@GregTyndall- how did you get those level of details (incrementalization_issues) for the MV build?

5 kudos

05-14-2025 8:56:06 AM

8 More Replies

by lezwon • Contributor

06-19-2025 6:01:57 PM

2553 Views
5 replies
1 kudos

Resolved! Unable to install custom wheel in serverless environment

Hey guys, I have created a custom wheel to hold my common code. Since I cannot install task libraries on a serverless environment, I am installing this library in multiple notebooks using %pip install. What I do is I upload the library to a volume in...

Data Engineering

2553 Views
5 replies
1 kudos

06-19-2025 6:01:57 PM

View Replies

Latest Reply

jameshughes
Databricks Partner

06-23-2025 4:35:44 AM

1 kudos

@lezwon - Very interesting, as I have been wanting to do this and didn't attempt due to finding it was listed as not supported. Can you confirm what cloud provider you are using? AWS, Azure, GCP?

1 kudos

06-23-2025 4:35:44 AM

4 More Replies

by Ramukamath1988 • New Contributor II

06-22-2025 11:50:14 AM

1274 Views
3 replies
0 kudos

Resolved! vacuum does not work as expected

The delta.logRetentionDuration (default 30 Days) is generally not set on any table in my workspace. As per the documentation you can time travel within duration of log retention provided delta.deletedFileRetentionDuration also set for 30days. Which ...

Data Engineering

1274 Views
3 replies
0 kudos

06-22-2025 11:50:14 AM

View Replies

Latest Reply

Ramukamath1988
New Contributor II

06-23-2025 12:41:47 AM

0 kudos

this is preciously my observation after vacuuming. I do understand these 2 parameters, but its not working as expected. Even after vacuuming(retention for 30 days) we can go back 2 months and log are retained for more than 3 months

0 kudos

06-23-2025 12:41:47 AM

2 More Replies

by chinmay0924 • New Contributor III

06-15-2025 10:12:33 PM

1826 Views
4 replies
0 kudos

mapInPandas returning an intermittent error related to data type interconversion

```File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 346, in _create_array return pa.Array.from_pandas( ^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 1126, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 3...

Data Engineering

1826 Views
4 replies
0 kudos

06-15-2025 10:12:33 PM

View Replies

Latest Reply

Raghavan93513
Databricks Employee

06-22-2025 11:47:35 PM

0 kudos

Hi @chinmay0924 Good day! Could you please confirm the following: Does the ID column incorrectly contain strings, which PyArrow fails to convert to integers (int64)?Are the data processed in both dataframes the exact same? Additionally, could you pro...

0 kudos

06-22-2025 11:47:35 PM

3 More Replies

by oneill • New Contributor II

06-22-2025 10:37:20 AM

874 Views
2 replies
0 kudos

Resolved! SET a parameter in BEGIN END statement

Hello,How to set a parameter in a begin end statement. for exemple the following query fails : beginSET ansi_mode = true;end;with Cannot resolve variable `ANSI_MODE` on search path `SYSTEM`.`SESSION`. SQLSTATE: 42883

Data Engineering

874 Views
2 replies
0 kudos

06-22-2025 10:37:20 AM

View Replies

Latest Reply

Vinay_M_R
Databricks Employee

06-22-2025 10:42:51 PM

0 kudos

Hello @oneill There is currently no supported workaround to dynamically change system/session parameters such as ansi_mode within a BEGIN ... END block in Databricks SQL procedures or scripts. Can you set these parameters before executing any proced...

0 kudos

06-22-2025 10:42:51 PM

1 More Replies

by Yuki • Contributor

06-18-2025 4:28:12 PM

1990 Views
2 replies
1 kudos

It's not going well to Connect to Amazon S3 with using Spark

I can't Connect to Amazon S3 well.I'm referencing and following this document: https://docs.databricks.com/gcp/en/connect/storage/amazon-s3But I can't access the S3 well.I believe the credentials are correct because I have verified that I can access ...

Data Engineering

1990 Views
2 replies
1 kudos

06-18-2025 4:28:12 PM

View Replies

Latest Reply

Yuki
Contributor

06-22-2025 4:31:44 PM

1 kudos

Hi Isi,Thank you for your response — I really appreciate it Apologies, I didn’t explain my concern clearly.What I’m trying to confirm may be whether the instance profile overrides the spark.conf settings defined in a notebook.For example, I want to a...

1 kudos

06-22-2025 4:31:44 PM

1 More Replies

by trang_le • Databricks Employee

06-07-2023 9:58:00 AM

1617 Views
1 replies
0 kudos

Announcing a new portfolio of Generative AI learning offerings on Databricks Academy Today, we launched new Generative AI, including LLMs, learning of...

Announcing a new portfolio of Generative AI learning offerings on Databricks AcademyToday, we launched new Generative AI, including LLMs, learning offerings for everyone from technical and business leaders to data practitioners, such as Data Scientis...

Data Engineering

1617 Views
1 replies
0 kudos

06-07-2023 9:58:00 AM

View Replies

Latest Reply

adb_newbie
New Contributor III

06-22-2025 3:29:22 PM

0 kudos

Where can i find all the scripts / notebooks presented in the course for "Large Language Models (LLMs): Application through Production" ?

0 kudos

06-22-2025 3:29:22 PM

by maarko • New Contributor II

06-20-2025 9:46:53 AM

1661 Views
1 replies
0 kudos

Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters

I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause. I have two views:View A: cont...

Data Engineering

1661 Views
1 replies
0 kudos

06-20-2025 9:46:53 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-22-2025 9:39:10 AM

0 kudos

Hi @maarko This is a fascinating issue that points to several potential causes related to differences betweenPhoton and standard Spark execution engines, particularly around decimal handling and parallelism.Root Causes1. Decimal Precision and Scale ...

0 kudos

06-22-2025 9:39:10 AM

by amitpm • New Contributor

06-16-2025 11:45:05 PM

782 Views
1 replies
0 kudos

Lakeflow Connect - Column filtering

Hi community , I am interested in learning more about the feature that was mentioned in recent summit about query pushdown in lakeflow connect for SQL server. I believe this feature will allow to select only the required columns from source tables. I...

Data Engineering

782 Views
1 replies
0 kudos

06-16-2025 11:45:05 PM

View Replies

Latest Reply

Isi
Honored Contributor III

06-22-2025 5:01:31 AM

0 kudos

Hey @amitpm According to the documentation, this feature is currently in Public Preview, so if your Databricks account has access to public preview features, you can reach out to support to enable it and start testing performance.Setup guide for Lake...

0 kudos

06-22-2025 5:01:31 AM

by SenthilJ • New Contributor III

03-31-2024 11:30:39 PM

6384 Views
2 replies
1 kudos

Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering

Disaster Recovery

Unity Catalog

6384 Views
2 replies
1 kudos

03-31-2024 11:30:39 PM

View Replies

Latest Reply

Isi
Honored Contributor III

06-22-2025 3:51:22 AM

1 kudos

Hi,In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores (each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore ...

1 kudos

06-22-2025 3:51:22 AM

1 More Replies

by arun_6482 • New Contributor

06-05-2025 8:19:25 AM

2817 Views
1 replies
0 kudos

NPIP_TUNNEL_SETUP_FAILURE

Hello Databricks team,I have configure databricks in AWS , but while creating cluster getting below error . Could you please to fix this issue ?error :VM setup failed due to Ngrok setup timeout. Please check your network configuration and try again o...

Data Engineering

2817 Views
1 replies
0 kudos

06-05-2025 8:19:25 AM

View Replies

Latest Reply

mani_22
Databricks Employee

06-21-2025 7:39:43 PM

0 kudos

@arun_6482 The error you have shared suggests that there is a network issue in your Databricks deployment within your AWS account. Please review the documentation provided below and ensure that all your routes and ports are configured correctly. Doc:...

0 kudos

06-21-2025 7:39:43 PM

by kavithai • New Contributor II

06-19-2025 8:36:28 PM

1234 Views
3 replies
2 kudos

Why Financial companies doesn't want to use databricks serverless and go with only classic Compute?

Data Engineering

1234 Views
3 replies
2 kudos

06-19-2025 8:36:28 PM

View Replies

Latest Reply

Isi
Honored Contributor III

06-21-2025 7:40:04 AM

2 kudos

Hey @kavithai Sometimes there are limitations in the laws of each country regarding "sharing" data outside private clouds or regions, which make it impossible to transmit data outside of your private networks. This is especially true for banks, which...

2 kudos

06-21-2025 7:40:04 AM

2 More Replies

by noorbasha534 • Valued Contributor II

06-20-2025 10:21:40 AM

742 Views
1 replies
0 kudos

Global INIT script on sql warehouse

Dear allIs it possible to configure global INIT script on sql warehouse? If not, how can I achieve my below requirement-For example, this script will have 2 key and value pairs defined : src_catalog_name=ABC, tgt_catalog_name=DEFI want these 2 be ref...

Data Engineering

742 Views
1 replies
0 kudos

06-20-2025 10:21:40 AM

View Replies

Latest Reply

Isi
Honored Contributor III

06-21-2025 7:32:16 AM

0 kudos

hello @noorbasha534 Unfortunately, in SQL Warehouses, you can't attach an init script that automatically runs when the warehouse starts (similar to what you can do with clusters).However, there are a few alternatives you can consider:Session Variable...

0 kudos

06-21-2025 7:32:16 AM

Databricks Community

Forum Posts

How to pass Dynamic parameters like job.run_id to a pipeline_task in Databricks workflow job?

Massive increase in the number of "GetBlobProperties" operations

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

Resolved! Unable to install custom wheel in serverless environment

Resolved! vacuum does not work as expected

mapInPandas returning an intermittent error related to data type interconversion

Resolved! SET a parameter in BEGIN END statement

It's not going well to Connect to Amazon S3 with using Spark

Announcing a new portfolio of Generative AI learning offerings on Databricks Academy Today, we launched new Generative AI, including LLMs, learning of...

Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters

Lakeflow Connect - Column filtering

Databricks Deep Clone

NPIP_TUNNEL_SETUP_FAILURE

Why Financial companies doesn't want to use databricks serverless and go with only classic Compute?

Global INIT script on sql warehouse

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template