cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Marvin_T
by New Contributor III
  • 21293 Views
  • 3 replies
  • 2 kudos

Resolved! Disabling query caching for SQL Warehouse

Hello everybody,I am currently trying to run some performance tests on queries in Databricks on Azure. For my tests, I am using a Classic SQL Warehouse in the SQL Editor. I have created two views that contain the same data but have different structur...

  • 21293 Views
  • 3 replies
  • 2 kudos
Latest Reply
Marvin_T
New Contributor III
  • 2 kudos

They are probably executing the same query plan now that you say it. And yes, restarting the warehouse does theoretically works but it isnt a nice solution.I guess I will do some restarting and build averages to have a good comparison for now

  • 2 kudos
2 More Replies
KristiLogos
by Contributor
  • 1137 Views
  • 2 replies
  • 0 kudos

Netsuite error - The driver could not open a JDBC connection. Check the URL

I'm trying to connect to Netsuite2 with the JDBC driver I added to my cluster. I'm testing this in my Sandbox Netsuite and I have the below code but it keeps saying:requirement failed: The driver could not open a JDBC connection. Check the URL: jdbc:...

  • 1137 Views
  • 2 replies
  • 0 kudos
Latest Reply
TheOC
Contributor III
  • 0 kudos

Hey @KristiLogos I had a little search online and found this which may be useful:https://stackoverflow.com/questions/79236996/pyspark-jdbc-connection-to-netsuite2-com-fails-with-failed-to-login-using-tbain short it seems that a token based connection...

  • 0 kudos
1 More Replies
seapen
by New Contributor II
  • 1244 Views
  • 1 replies
  • 0 kudos

[Question]: Get permissions for a schema containing backticks via the API

I am unsure if this is specific to the Java SDK, but i am having issues checking effective permissions on the following schema: databricks_dev.test_schema`In Scala i have the following example test: test("attempting to access schema with backtick") ...

  • 1244 Views
  • 1 replies
  • 0 kudos
Latest Reply
seapen
New Contributor II
  • 0 kudos

Update:Interestingly, if i URL encode _twice_ it appears to work, eg: test("attempting to access schema with backtick") { val client = new WorkspaceClient() client.config().setHost("redacted").setToken("redacted") val name = "databricks...

  • 0 kudos
lezwon
by Contributor
  • 827 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Serverless: Package import fails from notebook in subfolder after wheel installation

I have a Python package installed via wheel file in a Databricks serverless environment. The package imports work fine when my notebook is in the root directory, but fail when the notebook is in a subfolder. How can I fix this? src/ ├── datalake_util...

  • 827 Views
  • 2 replies
  • 1 kudos
Latest Reply
lezwon
Contributor
  • 1 kudos

It appears that there is a pre-installed package called datalake_utils available within Databricks. I had to rename my package to something else, and it worked like a charm.

  • 1 kudos
1 More Replies
AxelBrsn
by New Contributor III
  • 4936 Views
  • 5 replies
  • 1 kudos

Why materialized views are created in __databricks_internal ?

Hello, I have a question about why materialized views are created in "__databricks_internal" catalog?We specified catalog and schemas in the DLT Pipeline.

Data Engineering
catalog
Delta Live Table
materialized views
  • 4936 Views
  • 5 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

Hello,Materialized views created by Delta Live Tables (DLT) are stored in the __databricks_internal catalog for a few key reasons:Separation: This keeps system-generated tables (like materialized views) separate from your own tables and views, so you...

  • 1 kudos
4 More Replies
fostermink
by New Contributor II
  • 2195 Views
  • 6 replies
  • 0 kudos

Spark aws s3 folder partition pruning doesn't work

 Hi, I have a use case where my spark job running on EMR AWS, and it is reading from a s3 path: some-bucket/some-path/region=na/days=1during my read, I pass DataFrame df = sparkSession.read().option("mergeSchema", true).parquet("some-bucket/some-path...

  • 2195 Views
  • 6 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

In your case, Spark isn't automatically pruning partitions because:Missing Partition Discovery: For Spark to perform partition pruning when reading directly from paths (without a metastore table), you need to explicitly tell it about the partition st...

  • 0 kudos
5 More Replies
loinguyen3182
by New Contributor II
  • 2962 Views
  • 2 replies
  • 0 kudos

Spark Streaming Error Listing in GCS

I have faced a problem about error listing of _delta_log, when the spark read stream with delta format in GCS. This is the full log of the issue:org.apache.spark.sql.streaming.StreamingQueryException: Failed to get result: java.io.IOException: Error ...

  • 2962 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The key contributing factors to this issue, according to internal investigations and customer tickets, include: Large Number of Log Files in _delta_log: Delta Lake maintains a JSON transaction log that grows with every commit. The more files present...

  • 0 kudos
1 More Replies
sunnyj
by New Contributor III
  • 888 Views
  • 1 replies
  • 0 kudos

delta live table pipeline

I am very confused about the answer, can anyone help me with this ?

sunnyj_0-1750764423110.png
Data Engineering
axal_r
axel_r
  • 888 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 0 kudos

Hello sunnyjThe correct answer is B) At least one notebook library to be executed. This is because a Delta Live Tables pipeline requires at least one notebook to be assigned with it and that contains a table definition using @Dlt.table (or the sql sy...

  • 0 kudos
ankitmit
by New Contributor III
  • 4951 Views
  • 7 replies
  • 3 kudos

DLT Apply Changes

Hi,In DLT, how do we specify which columns we don't want to overwrite when using the “apply changes” operation in the DLT (in the attached example, we want to avoid overwriting the “created_time” column)?I am using this sample code dlt.apply_changes(...

  • 4951 Views
  • 7 replies
  • 3 kudos
Latest Reply
brunoillipronti
New Contributor II
  • 3 kudos

Same here, it's kinda ridiculous that apply_changes doesn't support a parameter to update certain columns... how come that is not a priority since this was released? 

  • 3 kudos
6 More Replies
ceediii
by New Contributor II
  • 1187 Views
  • 3 replies
  • 1 kudos

Resolved! Declarative Pipeline Asset Bundle Root Folder

Hi everyoneIn the new declarative pipeline UI (preview), we have the option to define a root folder.My ressource asset bundle is currently defined as:resources: jobs: my_job: name: "(${var.branch}) my_job" tasks: - task_key...

ceediii_0-1750694376542.png
  • 1187 Views
  • 3 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 1 kudos

You are welcome, great it helped!Best, Ilir

  • 1 kudos
2 More Replies
QLA_SethParker
by New Contributor III
  • 1720 Views
  • 2 replies
  • 0 kudos

Resolved! Error Creating Table

We are a current Databricks customer (Azure Databricks) experiencing an issue when creating a table. We have an existing Metastore in the Central region.  All other Workspaces in this Metastore/Region are behind Private Endpoints.  We are trying to c...

SethParker02_0-1748985494231.png
  • 1720 Views
  • 2 replies
  • 0 kudos
Latest Reply
QLA_SethParker
New Contributor III
  • 0 kudos

Hi Lou,Thank you so much for your detailed reply, and I apologize for leaving this open for so long.  I got wrapped up in another project and am just getting back to this.I was able to resolve it, at least in my situation, last night, so I wanted to ...

  • 0 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 2056 Views
  • 2 replies
  • 0 kudos

How to Optimize Batch Inference for Per-Item ML Models in Databricks

Hi everyone, I’m relatively new to Databricks. I worked with it a few months ago, and today I encountered an issue in our system. Basically, we have multiple ML models — one for each item — and we want to run inference in a more efficient way, ideall...

  • 2056 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Databricks offers unified capabilities for both real-time and batch inference across traditional ML models and large language models (LLMs) using Mosaic AI Model Serving and AI Functions (notably the ai_query function). For your use case (n items, n ...

  • 0 kudos
1 More Replies
ShivangiB1
by New Contributor III
  • 430 Views
  • 1 replies
  • 0 kudos

VACUUM operations on existing table after enabling predictive optimization

Hey Team, I have a table with files deleted 7 days before, but predictive optimization was not enabled.Today have enabled predictive optimization, how much time will it take to perform vaccum on existing data.

  • 430 Views
  • 1 replies
  • 0 kudos
Latest Reply
shivyadav
New Contributor II
  • 0 kudos

Once predictive optimization is enabled and the table is in use, VACUUM will be automatically triggered 7 days later.But if a table with predictive optimization enabled has no active usage, the optimization process does not detect the table or its pr...

  • 0 kudos
joedata
by New Contributor
  • 3424 Views
  • 1 replies
  • 0 kudos

pywin32

A python module called pywin32 enables users to read an excel file, make changes to specific cells, execute a Refresh All which refreshes all the data connections, and save the changes made to an excel file. This cannot be used on databricks because ...

  • 3424 Views
  • 1 replies
  • 0 kudos
Latest Reply
shivyadav
New Contributor II
  • 0 kudos

Have you tried openpyxl , it seems a good fit as it have all the requirements you mentioned in the post, we have used in one of our application with databricks.

  • 0 kudos
NancyX
by New Contributor II
  • 756 Views
  • 2 replies
  • 0 kudos

How to pass Dynamic parameters like job.run_id to a pipeline_task in Databricks workflow job?

Is it possible to pass dynamic parameters, such as job.run_id to a pipeline_task within a Databricks Workflow job?

  • 756 Views
  • 2 replies
  • 0 kudos
Latest Reply
dataminion01
New Contributor II
  • 0 kudos

yes. it's in the Parameters section 

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels