cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AxelBrsn
by Databricks Partner
  • 6032 Views
  • 5 replies
  • 2 kudos

Why materialized views are created in __databricks_internal ?

Hello, I have a question about why materialized views are created in "__databricks_internal" catalog?We specified catalog and schemas in the DLT Pipeline.

Data Engineering
catalog
Delta Live Table
materialized views
  • 6032 Views
  • 5 replies
  • 2 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 2 kudos

Hello,Materialized views created by Delta Live Tables (DLT) are stored in the __databricks_internal catalog for a few key reasons:Separation: This keeps system-generated tables (like materialized views) separate from your own tables and views, so you...

  • 2 kudos
4 More Replies
loinguyen3182
by New Contributor II
  • 3766 Views
  • 2 replies
  • 0 kudos

Spark Streaming Error Listing in GCS

I have faced a problem about error listing of _delta_log, when the spark read stream with delta format in GCS. This is the full log of the issue:org.apache.spark.sql.streaming.StreamingQueryException: Failed to get result: java.io.IOException: Error ...

  • 3766 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The key contributing factors to this issue, according to internal investigations and customer tickets, include: Large Number of Log Files in _delta_log: Delta Lake maintains a JSON transaction log that grows with every commit. The more files present...

  • 0 kudos
1 More Replies
sunnyj
by New Contributor III
  • 1227 Views
  • 1 replies
  • 0 kudos

delta live table pipeline

I am very confused about the answer, can anyone help me with this ?

sunnyj_0-1750764423110.png
Data Engineering
axal_r
axel_r
  • 1227 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 0 kudos

Hello sunnyjThe correct answer is B) At least one notebook library to be executed. This is because a Delta Live Tables pipeline requires at least one notebook to be assigned with it and that contains a table definition using @Dlt.table (or the sql sy...

  • 0 kudos
ankitmit
by New Contributor III
  • 7054 Views
  • 7 replies
  • 3 kudos

DLT Apply Changes

Hi,In DLT, how do we specify which columns we don't want to overwrite when using the “apply changes” operation in the DLT (in the attached example, we want to avoid overwriting the “created_time” column)?I am using this sample code dlt.apply_changes(...

  • 7054 Views
  • 7 replies
  • 3 kudos
Latest Reply
brunoillipronti
New Contributor II
  • 3 kudos

Same here, it's kinda ridiculous that apply_changes doesn't support a parameter to update certain columns... how come that is not a priority since this was released? 

  • 3 kudos
6 More Replies
ceediii
by New Contributor II
  • 1880 Views
  • 3 replies
  • 1 kudos

Resolved! Declarative Pipeline Asset Bundle Root Folder

Hi everyoneIn the new declarative pipeline UI (preview), we have the option to define a root folder.My ressource asset bundle is currently defined as:resources: jobs: my_job: name: "(${var.branch}) my_job" tasks: - task_key...

ceediii_0-1750694376542.png
  • 1880 Views
  • 3 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 1 kudos

You are welcome, great it helped!Best, Ilir

  • 1 kudos
2 More Replies
QLA_SethParker
by Databricks Partner
  • 3056 Views
  • 2 replies
  • 0 kudos

Resolved! Error Creating Table

We are a current Databricks customer (Azure Databricks) experiencing an issue when creating a table. We have an existing Metastore in the Central region.  All other Workspaces in this Metastore/Region are behind Private Endpoints.  We are trying to c...

SethParker02_0-1748985494231.png
  • 3056 Views
  • 2 replies
  • 0 kudos
Latest Reply
QLA_SethParker
Databricks Partner
  • 0 kudos

Hi Lou,Thank you so much for your detailed reply, and I apologize for leaving this open for so long.  I got wrapped up in another project and am just getting back to this.I was able to resolve it, at least in my situation, last night, so I wanted to ...

  • 0 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 3533 Views
  • 2 replies
  • 0 kudos

How to Optimize Batch Inference for Per-Item ML Models in Databricks

Hi everyone, I’m relatively new to Databricks. I worked with it a few months ago, and today I encountered an issue in our system. Basically, we have multiple ML models — one for each item — and we want to run inference in a more efficient way, ideall...

  • 3533 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Databricks offers unified capabilities for both real-time and batch inference across traditional ML models and large language models (LLMs) using Mosaic AI Model Serving and AI Functions (notably the ai_query function). For your use case (n items, n ...

  • 0 kudos
1 More Replies
ShivangiB1
by New Contributor III
  • 632 Views
  • 1 replies
  • 0 kudos

VACUUM operations on existing table after enabling predictive optimization

Hey Team, I have a table with files deleted 7 days before, but predictive optimization was not enabled.Today have enabled predictive optimization, how much time will it take to perform vaccum on existing data.

  • 632 Views
  • 1 replies
  • 0 kudos
Latest Reply
shivyadav
New Contributor II
  • 0 kudos

Once predictive optimization is enabled and the table is in use, VACUUM will be automatically triggered 7 days later.But if a table with predictive optimization enabled has no active usage, the optimization process does not detect the table or its pr...

  • 0 kudos
joedata
by New Contributor
  • 3918 Views
  • 1 replies
  • 0 kudos

pywin32

A python module called pywin32 enables users to read an excel file, make changes to specific cells, execute a Refresh All which refreshes all the data connections, and save the changes made to an excel file. This cannot be used on databricks because ...

  • 3918 Views
  • 1 replies
  • 0 kudos
Latest Reply
shivyadav
New Contributor II
  • 0 kudos

Have you tried openpyxl , it seems a good fit as it have all the requirements you mentioned in the post, we have used in one of our application with databricks.

  • 0 kudos
NancyX
by New Contributor II
  • 1013 Views
  • 2 replies
  • 0 kudos

How to pass Dynamic parameters like job.run_id to a pipeline_task in Databricks workflow job?

Is it possible to pass dynamic parameters, such as job.run_id to a pipeline_task within a Databricks Workflow job?

  • 1013 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataminion01
New Contributor II
  • 0 kudos

yes. it's in the Parameters section 

  • 0 kudos
1 More Replies
syazwansuhaimi
by New Contributor
  • 3270 Views
  • 1 replies
  • 0 kudos

Massive increase in the number of "GetBlobProperties" operations

I had a massive increase in the volume of "GetBlobProperties" operations in my Azure Blob Storage account. The storage logs indicate that all the extra operations have IPs attributed to my Databricks resource group. I haven't made any changes to my r...

  • 3270 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 0 kudos

Massive increase in "GetBlobProperties" operations in your Azure Blob Storage account could be due to the following1. Delta Tables and _delta_log Metadata AccessIf you're using Delta Lake, Databricks reads blob properties (e.g., last-modified time, s...

  • 0 kudos
GregTyndall
by New Contributor II
  • 5617 Views
  • 9 replies
  • 5 kudos

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?"incrementalization_issues": [{"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MO...

  • 5617 Views
  • 9 replies
  • 5 kudos
Latest Reply
_DatabricksUser
New Contributor III
  • 5 kudos

@GregTyndall- how did you get those level of details (incrementalization_issues) for the MV build?

  • 5 kudos
8 More Replies
lezwon
by Contributor
  • 3002 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to install custom wheel in serverless environment

Hey guys, I have created a custom wheel to hold my common code. Since I cannot install task libraries on a serverless environment, I am installing this library in multiple notebooks using %pip install. What I do is I upload the library to a volume in...

  • 3002 Views
  • 5 replies
  • 1 kudos
Latest Reply
jameshughes
Databricks Partner
  • 1 kudos

@lezwon - Very interesting, as I have been wanting to do this and didn't attempt due to finding it was listed as not supported.  Can you confirm what cloud provider you are using? AWS, Azure, GCP?

  • 1 kudos
4 More Replies
Ramukamath1988
by New Contributor II
  • 1484 Views
  • 3 replies
  • 0 kudos

Resolved! vacuum does not work as expected

The delta.logRetentionDuration (default 30 Days) is  generally not set on any table in my workspace. As per the documentation you can time travel within duration of log retention provided delta.deletedFileRetentionDuration also set for 30days. Which ...

  • 1484 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramukamath1988
New Contributor II
  • 0 kudos

 this is preciously my observation after vacuuming. I do understand these 2 parameters, but its  not working as expected. Even after vacuuming(retention for 30 days)  we can go back 2 months and log are retained for more than 3 months

  • 0 kudos
2 More Replies
chinmay0924
by New Contributor III
  • 2205 Views
  • 4 replies
  • 0 kudos

mapInPandas returning an intermittent error related to data type interconversion

```File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 346, in _create_array return pa.Array.from_pandas( ^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/array.pxi", line 1126, in pyarrow.lib.Array.from_pandas File "pyarrow/array.pxi", line 3...

  • 2205 Views
  • 4 replies
  • 0 kudos
Latest Reply
Raghavan93513
Databricks Employee
  • 0 kudos

Hi @chinmay0924 Good day! Could you please confirm the following: Does the ID column incorrectly contain strings, which PyArrow fails to convert to integers (int64)?Are the data processed in both dataframes the exact same? Additionally, could you pro...

  • 0 kudos
3 More Replies
Labels