cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NancyX
by New Contributor II
  • 870 Views
  • 2 replies
  • 0 kudos

How to pass Dynamic parameters like job.run_id to a pipeline_task in Databricks workflow job?

Is it possible to pass dynamic parameters, such as job.run_id to a pipeline_task within a Databricks Workflow job?

  • 870 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataminion01
New Contributor II
  • 0 kudos

yes. it's in the Parameters section 

  • 0 kudos
1 More Replies
syazwansuhaimi
by New Contributor
  • 3207 Views
  • 1 replies
  • 0 kudos

Massive increase in the number of "GetBlobProperties" operations

I had a massive increase in the volume of "GetBlobProperties" operations in my Azure Blob Storage account. The storage logs indicate that all the extra operations have IPs attributed to my Databricks resource group. I haven't made any changes to my r...

  • 3207 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 0 kudos

Massive increase in "GetBlobProperties" operations in your Azure Blob Storage account could be due to the following1. Delta Tables and _delta_log Metadata AccessIf you're using Delta Lake, Databricks reads blob properties (e.g., last-modified time, s...

  • 0 kudos
GregTyndall
by New Contributor II
  • 5044 Views
  • 9 replies
  • 5 kudos

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?"incrementalization_issues": [{"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MO...

  • 5044 Views
  • 9 replies
  • 5 kudos
Latest Reply
_DatabricksUser
New Contributor III
  • 5 kudos

@GregTyndall- how did you get those level of details (incrementalization_issues) for the MV build?

  • 5 kudos
8 More Replies
lezwon
by Contributor
  • 2553 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to install custom wheel in serverless environment

Hey guys, I have created a custom wheel to hold my common code. Since I cannot install task libraries on a serverless environment, I am installing this library in multiple notebooks using %pip install. What I do is I upload the library to a volume in...

  • 2553 Views
  • 5 replies
  • 1 kudos
Latest Reply
jameshughes
Databricks Partner
  • 1 kudos

@lezwon - Very interesting, as I have been wanting to do this and didn't attempt due to finding it was listed as not supported.  Can you confirm what cloud provider you are using? AWS, Azure, GCP?

  • 1 kudos
4 More Replies
Ramukamath1988
by New Contributor II
  • 1274 Views
  • 3 replies
  • 0 kudos

Resolved! vacuum does not work as expected

The delta.logRetentionDuration (default 30 Days) is  generally not set on any table in my workspace. As per the documentation you can time travel within duration of log retention provided delta.deletedFileRetentionDuration also set for 30days. Which ...

  • 1274 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramukamath1988
New Contributor II
  • 0 kudos

 this is preciously my observation after vacuuming. I do understand these 2 parameters, but its  not working as expected. Even after vacuuming(retention for 30 days)  we can go back 2 months and log are retained for more than 3 months

  • 0 kudos
2 More Replies
chinmay0924
by New Contributor III
  • 1826 Views
  • 4 replies
  • 0 kudos

mapInPandas returning an intermittent error related to data type interconversion

```File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 346, in _create_array return pa.Array.from_pandas( ^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 1126, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 3...

  • 1826 Views
  • 4 replies
  • 0 kudos
Latest Reply
Raghavan93513
Databricks Employee
  • 0 kudos

Hi @chinmay0924 Good day! Could you please confirm the following: Does the ID column incorrectly contain strings, which PyArrow fails to convert to integers (int64)?Are the data processed in both dataframes the exact same? Additionally, could you pro...

  • 0 kudos
3 More Replies
oneill
by New Contributor II
  • 874 Views
  • 2 replies
  • 0 kudos

Resolved! SET a parameter in BEGIN END statement

Hello,How to set a parameter in a begin end statement. for exemple the following query fails : beginSET ansi_mode = true;end;with Cannot resolve variable `ANSI_MODE` on search path `SYSTEM`.`SESSION`. SQLSTATE: 42883   

  • 874 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vinay_M_R
Databricks Employee
  • 0 kudos

Hello @oneill  There is currently no supported workaround to dynamically change system/session parameters such as ansi_mode within a BEGIN ... END block in Databricks SQL procedures or scripts. Can you set these parameters before executing any proced...

  • 0 kudos
1 More Replies
Yuki
by Contributor
  • 1990 Views
  • 2 replies
  • 1 kudos

It's not going well to Connect to Amazon S3 with using Spark

I can't Connect to Amazon S3 well.I'm referencing and following this document: https://docs.databricks.com/gcp/en/connect/storage/amazon-s3But I can't access the S3 well.I believe the credentials are correct because I have verified that I can access ...

  • 1990 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yuki
Contributor
  • 1 kudos

Hi Isi,Thank you for your response — I really appreciate it Apologies, I didn’t explain my concern clearly.What I’m trying to confirm may be whether the instance profile overrides the spark.conf settings defined in a notebook.For example, I want to a...

  • 1 kudos
1 More Replies
trang_le
by Databricks Employee
  • 1617 Views
  • 1 replies
  • 0 kudos

Announcing a new portfolio of Generative AI learning offerings on Databricks Academy Today, we launched new Generative AI, including LLMs, learning of...

Announcing a new portfolio of Generative AI learning offerings on Databricks AcademyToday, we launched new Generative AI, including LLMs, learning offerings for everyone from technical and business leaders to data practitioners, such as Data Scientis...

  • 1617 Views
  • 1 replies
  • 0 kudos
Latest Reply
adb_newbie
New Contributor III
  • 0 kudos

Where can i find all the scripts / notebooks presented in the course for "Large Language Models (LLMs): Application through Production" ?

  • 0 kudos
maarko
by New Contributor II
  • 1661 Views
  • 1 replies
  • 0 kudos

Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters

 I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause. I have two views:View A: cont...

  • 1661 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @maarko  This is a fascinating issue that points to several potential causes related to differences betweenPhoton and standard Spark execution engines, particularly around decimal handling and parallelism.Root Causes1. Decimal Precision and Scale ...

  • 0 kudos
amitpm
by New Contributor
  • 782 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect - Column filtering

Hi community , I am interested in learning more about the feature that was mentioned in recent summit about query pushdown in lakeflow connect for SQL server. I believe this feature will allow to select only the required columns from source tables. I...

  • 782 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @amitpm According to the documentation, this feature is currently in Public Preview, so if your Databricks account has access to public preview features, you can reach out to support to enable it and start testing performance.Setup guide for Lake...

  • 0 kudos
SenthilJ
by New Contributor III
  • 6384 Views
  • 2 replies
  • 1 kudos

Databricks Deep Clone

Hi,I am working on a DR design for Databricks in Azure. The recommendation from Databricks is to use Deep Clone to clone the Unity Catalog tables (within or across catalogs). My design is to ensure that DR is managed across different regions i.e. pri...

Data Engineering
Disaster Recovery
Unity Catalog
  • 6384 Views
  • 2 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor III
  • 1 kudos

Hi,In my opinion, Databricks Deep Clone does not currently support cloning Unity Catalog tables natively across different metastores (each region having its own metastore). Deep Clone requires that both source and target belong to the same metastore ...

  • 1 kudos
1 More Replies
arun_6482
by New Contributor
  • 2817 Views
  • 1 replies
  • 0 kudos

NPIP_TUNNEL_SETUP_FAILURE

Hello Databricks team,I have configure databricks in AWS , but while creating cluster getting below error . Could you please to fix this issue ?error :VM setup failed due to Ngrok setup timeout. Please check your network configuration and try again o...

  • 2817 Views
  • 1 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

@arun_6482 The error you have shared suggests that there is a network issue in your Databricks deployment within your AWS account. Please review the documentation provided below and ensure that all your routes and ports are configured correctly. Doc:...

  • 0 kudos
kavithai
by New Contributor II
  • 1234 Views
  • 3 replies
  • 2 kudos
  • 1234 Views
  • 3 replies
  • 2 kudos
Latest Reply
Isi
Honored Contributor III
  • 2 kudos

Hey @kavithai Sometimes there are limitations in the laws of each country regarding "sharing" data outside private clouds or regions, which make it impossible to transmit data outside of your private networks. This is especially true for banks, which...

  • 2 kudos
2 More Replies
noorbasha534
by Valued Contributor II
  • 742 Views
  • 1 replies
  • 0 kudos

Global INIT script on sql warehouse

Dear allIs it possible to configure global INIT script on sql warehouse? If not, how can I achieve my below requirement-For example, this script will have 2 key and value pairs defined : src_catalog_name=ABC, tgt_catalog_name=DEFI want these 2 be ref...

  • 742 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

hello @noorbasha534 Unfortunately, in SQL Warehouses, you can't attach an init script that automatically runs when the warehouse starts (similar to what you can do with clusters).However, there are a few alternatives you can consider:Session Variable...

  • 0 kudos
Labels