cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

seapen
by New Contributor II
  • 160 Views
  • 1 replies
  • 0 kudos

[Question]: Get permissions for a schema containing backticks via the API

I am unsure if this is specific to the Java SDK, but i am having issues checking effective permissions on the following schema: databricks_dev.test_schema`In Scala i have the following example test: test("attempting to access schema with backtick") ...

  • 160 Views
  • 1 replies
  • 0 kudos
Latest Reply
seapen
New Contributor II
  • 0 kudos

Update:Interestingly, if i URL encode _twice_ it appears to work, eg: test("attempting to access schema with backtick") { val client = new WorkspaceClient() client.config().setHost("redacted").setToken("redacted") val name = "databricks...

  • 0 kudos
lezwon
by Contributor
  • 202 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Serverless: Package import fails from notebook in subfolder after wheel installation

I have a Python package installed via wheel file in a Databricks serverless environment. The package imports work fine when my notebook is in the root directory, but fail when the notebook is in a subfolder. How can I fix this? src/ ├── datalake_util...

  • 202 Views
  • 2 replies
  • 1 kudos
Latest Reply
lezwon
Contributor
  • 1 kudos

It appears that there is a pre-installed package called datalake_utils available within Databricks. I had to rename my package to something else, and it worked like a charm.

  • 1 kudos
1 More Replies
AxelBrsn
by New Contributor III
  • 3741 Views
  • 5 replies
  • 1 kudos

Why materialized views are created in __databricks_internal ?

Hello, I have a question about why materialized views are created in "__databricks_internal" catalog?We specified catalog and schemas in the DLT Pipeline.

Data Engineering
catalog
Delta Live Table
materialized views
  • 3741 Views
  • 5 replies
  • 1 kudos
Latest Reply
Yogesh_378691
New Contributor III
  • 1 kudos

Hello,Materialized views created by Delta Live Tables (DLT) are stored in the __databricks_internal catalog for a few key reasons:Separation: This keeps system-generated tables (like materialized views) separate from your own tables and views, so you...

  • 1 kudos
4 More Replies
fostermink
by New Contributor II
  • 944 Views
  • 6 replies
  • 0 kudos

Spark aws s3 folder partition pruning doesn't work

 Hi, I have a use case where my spark job running on EMR AWS, and it is reading from a s3 path: some-bucket/some-path/region=na/days=1during my read, I pass DataFrame df = sparkSession.read().option("mergeSchema", true).parquet("some-bucket/some-path...

  • 944 Views
  • 6 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

In your case, Spark isn't automatically pruning partitions because:Missing Partition Discovery: For Spark to perform partition pruning when reading directly from paths (without a metastore table), you need to explicitly tell it about the partition st...

  • 0 kudos
5 More Replies
loinguyen3182
by New Contributor II
  • 906 Views
  • 2 replies
  • 0 kudos

Spark Streaming Error Listing in GCS

I have faced a problem about error listing of _delta_log, when the spark read stream with delta format in GCS. This is the full log of the issue:org.apache.spark.sql.streaming.StreamingQueryException: Failed to get result: java.io.IOException: Error ...

  • 906 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The key contributing factors to this issue, according to internal investigations and customer tickets, include: Large Number of Log Files in _delta_log: Delta Lake maintains a JSON transaction log that grows with every commit. The more files present...

  • 0 kudos
1 More Replies
sunnyj
by New Contributor III
  • 165 Views
  • 1 replies
  • 0 kudos

delta live table pipeline

I am very confused about the answer, can anyone help me with this ?

sunnyj_0-1750764423110.png
Data Engineering
axal_r
axel_r
  • 165 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
Valued Contributor
  • 0 kudos

Hello sunnyjThe correct answer is B) At least one notebook library to be executed. This is because a Delta Live Tables pipeline requires at least one notebook to be assigned with it and that contains a table definition using @Dlt.table (or the sql sy...

  • 0 kudos
ankitmit
by New Contributor III
  • 2652 Views
  • 7 replies
  • 3 kudos

DLT Apply Changes

Hi,In DLT, how do we specify which columns we don't want to overwrite when using the “apply changes” operation in the DLT (in the attached example, we want to avoid overwriting the “created_time” column)?I am using this sample code dlt.apply_changes(...

  • 2652 Views
  • 7 replies
  • 3 kudos
Latest Reply
brunoillipronti
New Contributor II
  • 3 kudos

Same here, it's kinda ridiculous that apply_changes doesn't support a parameter to update certain columns... how come that is not a priority since this was released? 

  • 3 kudos
6 More Replies
ceediii
by New Contributor II
  • 318 Views
  • 3 replies
  • 1 kudos

Resolved! Declarative Pipeline Asset Bundle Root Folder

Hi everyoneIn the new declarative pipeline UI (preview), we have the option to define a root folder.My ressource asset bundle is currently defined as:resources: jobs: my_job: name: "(${var.branch}) my_job" tasks: - task_key...

ceediii_0-1750694376542.png
  • 318 Views
  • 3 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Valued Contributor
  • 1 kudos

You are welcome, great it helped!Best, Ilir

  • 1 kudos
2 More Replies
sandy311
by New Contributor III
  • 771 Views
  • 2 replies
  • 0 kudos

Install python packages on serverless compute in DLT pipelines (using asset bundles)

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?I didn’t see any direct option for this, apart from installing packages manually within a notebook.I tried ins...

Data Engineering
DLT Serverless
  • 771 Views
  • 2 replies
  • 0 kudos
Latest Reply
sandy311
New Contributor III
  • 0 kudos

I know this can be works with task like notebook, python etc, but it won't work with DLT pipelines 

  • 0 kudos
1 More Replies
QLA_SethParker
by New Contributor III
  • 457 Views
  • 2 replies
  • 0 kudos

Resolved! Error Creating Table

We are a current Databricks customer (Azure Databricks) experiencing an issue when creating a table. We have an existing Metastore in the Central region.  All other Workspaces in this Metastore/Region are behind Private Endpoints.  We are trying to c...

SethParker02_0-1748985494231.png
  • 457 Views
  • 2 replies
  • 0 kudos
Latest Reply
QLA_SethParker
New Contributor III
  • 0 kudos

Hi Lou,Thank you so much for your detailed reply, and I apologize for leaving this open for so long.  I got wrapped up in another project and am just getting back to this.I was able to resolve it, at least in my situation, last night, so I wanted to ...

  • 0 kudos
1 More Replies
969091
by New Contributor
  • 30045 Views
  • 9 replies
  • 8 kudos

Send custom emails from databricks notebook without using third party SMTP server. Would like to utilize databricks existing smtp or databricks api.

We want to use existing databricks smtp server or if databricks api can used to send custom emails. Databricks Workflows sends email notifications on success, failure, etc. of jobs but cannot send custom emails. So we want to send custom emails to di...

  • 30045 Views
  • 9 replies
  • 8 kudos
Latest Reply
pk13
New Contributor II
  • 8 kudos

Hello @mido1978 I am also in need of something similar.I have two tables one has the details of the recipient and another has some log data which are identified by a common key. Emails need be sent with the mentioned log to the recipient for whom it'...

  • 8 kudos
8 More Replies
jeremy98
by Honored Contributor
  • 510 Views
  • 2 replies
  • 0 kudos

How to Optimize Batch Inference for Per-Item ML Models in Databricks

Hi everyone, I’m relatively new to Databricks. I worked with it a few months ago, and today I encountered an issue in our system. Basically, we have multiple ML models — one for each item — and we want to run inference in a more efficient way, ideall...

  • 510 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Databricks offers unified capabilities for both real-time and batch inference across traditional ML models and large language models (LLMs) using Mosaic AI Model Serving and AI Functions (notably the ai_query function). For your use case (n items, n ...

  • 0 kudos
1 More Replies
pvalcheva
by New Contributor
  • 468 Views
  • 0 replies
  • 0 kudos

Simba Spark Driver fails for big datasets in Excel

Hello, I am getting the following error when I want to extract data from Databricks via VBA code. The code for the connection is:Option ExplicitConst adStateClosed = 0Public CnAdo As New ADODB.ConnectionDim DSN_name As StringDim WB As WorkbookDim das...

pvalcheva_0-1750755864726.png
  • 468 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels