cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

allyallen
by New Contributor III
  • 4052 Views
  • 5 replies
  • 0 kudos

Resolved! Variable Compute clusters within a Job

We have 3 possible compute clusters that we can run a notebook against.They are varying sizes and the one that the notebook uses will depend on the size of the data being processed.We "t-shirt size" each tenant base on their data size (S, M, L) and c...

  • 4052 Views
  • 5 replies
  • 0 kudos
Latest Reply
allyallen
New Contributor III
  • 0 kudos

Hi @eniwoke That's a great solution thank you so much!Our process is now as follows:NB1 gets the tenant t-shirt size and sets the cluster_id for each size as a variable.The notebook then loops through each tenant and using the DataBricks API updates ...

  • 0 kudos
4 More Replies
Steffen
by New Contributor III
  • 3457 Views
  • 4 replies
  • 1 kudos

Resolved! DictionaryFilters Pushdown on Views

HelloI have a very simple table with time series data with three columns:id (long): unique id of signalts (unix timestamp): timestamp of the event in unix timestamp formatvalue (double): value of the signal at the given timestampFor every second ther...

  • 3457 Views
  • 4 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Steffen , This happens because you're applying some functions to ts attribute like FLOOR, from_unix_timestamp etc., which hides the raw ts from Spark's optimizer, so it can’t push down filters.If you can, try to add additional attribute to your u...

  • 1 kudos
3 More Replies
ShankarM
by Contributor
  • 2493 Views
  • 3 replies
  • 0 kudos

DBR version 10.4 impact

hi,For one of our projects which is in production we are using DBR 10.4 for which EOL was Mar 18th, 2025.I wanted to know will there any impact to existing workloads which are running in production. Is yes then can you let me know the impact and risk...

  • 2493 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hello @ShankarM Actually, there is no official End of Life (EoL) date provided by Databricks. If you check the documentation I referenced in my previous message, EoL is the next phase after End of Support (EoS), but Databricks does not announce a spe...

  • 0 kudos
2 More Replies
om_bk_00
by New Contributor III
  • 2058 Views
  • 5 replies
  • 1 kudos

Resolved! passing job parameters through the terminal to a job

I am having troubles overriding the job parameters that are deployed in my local workspace.e.g I have a job that fills tables with data,the parameters given to it are random and I would like to override them when I run through my terminaldatabricks b...

  • 2058 Views
  • 5 replies
  • 1 kudos
Latest Reply
EduardoSB
New Contributor II
  • 1 kudos

Hi! I just found this post because I'm having troubles trying to pass custom values to some parameters in my jobs. I guess databricks bundle run <job_name> --python-params "--param1=value1,--param2=value2,..."should work, shouldn't it? Is any other e...

  • 1 kudos
4 More Replies
adhi_databricks
by Contributor
  • 5453 Views
  • 7 replies
  • 1 kudos

Resolved! Requirement to run a databricks job from another job based on custom conditions using DAB

Hi everyone,I'm using Databricks Asset Bundles to deploy a job that includes a run_job_task, which requires a job_id to trigger another job.For different targets (dev, staging, prod), I need to pass different job_ids dynamically. To achieve this, I’v...

  • 5453 Views
  • 7 replies
  • 1 kudos
Latest Reply
adhi_databricks
Contributor
  • 1 kudos

Hey folks, Thanks for the help hereWas able to solve this issue with updating the databricks cli to latest versionThanks once again!!

  • 1 kudos
6 More Replies
liu
by Contributor
  • 3310 Views
  • 2 replies
  • 1 kudos

Resolved! I encountered an error when trying to use dbutils to operate on files with a file: prefix.

When I execute the statement:dbutils.fs.ls("file:/tmp/")I receive the following error:ExecutionError: (java.lang.SecurityException) Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbiddenDoes an...

  • 3310 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @liu ,Which type of cluster are you using? Which access mode? Your compute must have Dedicated (formerly single user) access mode 

  • 1 kudos
1 More Replies
noorbasha534
by Valued Contributor II
  • 772 Views
  • 5 replies
  • 0 kudos

DQ anomaly detection : _quality_monitoring_summary table DDL

DearsDoes anyone have the DDL for _quality_monitoring_summary table?This is created by the DQ anomaly detector. Since the detector was trying to create a managed table which is not allowed in the environment I work, I am attempting to create this on ...

  • 772 Views
  • 5 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 0 kudos

Hi,The _quality_monitoring_summary table is an internal table created by the Data Quality Anomaly Detector in Databricks Lakehouse Monitoring. Unfortunately, the full DDL is not publicly documented in detail, and trying to manually create it can lead...

  • 0 kudos
4 More Replies
ismaelhenzel
by Contributor III
  • 7497 Views
  • 4 replies
  • 11 kudos

Resolved! DELTA LIVE TABLES - MATERIALIZED VIEW DOES NOT INCREMENT NOTHING !

I'm very disappointed with this framework. The documentation is inadequate, and it has many limitations. I want to run materialized views with incremental updates, but DLT insists on performing a full recompute. Why is it doing this? Here is the log ...

  • 7497 Views
  • 4 replies
  • 11 kudos
Latest Reply
1ct0
New Contributor II
  • 11 kudos

I'm seeing a subtype of EXCESSIVE_OPERATOR_NESTING that is preventing incremental updates. Is there any documentation so that this these issues can attempt to be resolved? 

  • 11 kudos
3 More Replies
manish1987c
by New Contributor III
  • 2643 Views
  • 6 replies
  • 1 kudos

Delta Live Table - Flow detected an update or delete to one or more rows in the source table

I have create a pipeline where i am ingesting the data from bronze to silver and using SCD 1, however when i am trying to create gold table as dlt it is giving me error as "Flow 'user_silver' has FAILED fatally. An error occurred because we detected ...

manish1987c_0-1718341166099.png manish1987c_1-1718341206991.png
  • 2643 Views
  • 6 replies
  • 1 kudos
Latest Reply
Pat
Esteemed Contributor
  • 1 kudos

Streaming tables in Delta Live Tables (DLT) only support append-only operations in the SOURCE.The error occurs because:1. Your silver table uses SCD Type 1, which performs UPDATE and DELETE operations on existing records2. Your gold table is defined ...

  • 1 kudos
5 More Replies
ShivangiB1
by New Contributor III
  • 2982 Views
  • 3 replies
  • 0 kudos

Embed Databricks AI/BI dashboard in external website and validate using service principal

Hey Team,I tried embedding my AI/BI databricks dashboard in sharepoint and it worked.But i dont want to validate using my credential, can i use service principal to validate.

  • 2982 Views
  • 3 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @ShivangiB1! You can publish the dashboard using a Service Principal via API, which allows you to embed it in SharePoint without requiring individual user logins.For more details, please refer to the documentation here: https://docs.databricks....

  • 0 kudos
2 More Replies
prasanna_r
by New Contributor
  • 1541 Views
  • 1 replies
  • 0 kudos

Resolved! Download all pages of a multi-page dashboard

Hi,I have created a multi-page dashboard in databricks. I want to download all the pages of the dashboard as a single pdf file. But when i export the dashboard I get it only in .json format. Is there a way to download all the pages as a pdf file?

  • 1541 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 0 kudos

Hello @prasanna_r ,Currently Databricks does not support to export dashboards in pdf format. What I can suggest is to use browser feature to export as pdf, or take a screenshot and save them in the word -> pdf. Then you can use pdf programmatically h...

  • 0 kudos
drag7ter
by Contributor
  • 3591 Views
  • 7 replies
  • 0 kudos

Disable ssl for federated connection on Amazon Redshift

Here is a doc how to set up connection and foreign catalog, but there is no any mentions how to disable ssl for the connection.https://docs.databricks.com/en/query-federation/redshift.htmlWhen I set up connection and foreign catalog I get this error,...

  • 3591 Views
  • 7 replies
  • 0 kudos
Latest Reply
system_is_down
New Contributor II
  • 0 kudos

Hey @Alberto_Umana just checking in on this again. Anything new on this? I've tried creating catalogs and connections via UI, REST API, and CLI as well but none have worked to disable SSL. The documentation references this ability here: https://docs....

  • 0 kudos
6 More Replies
Sainath368
by Contributor
  • 1278 Views
  • 1 replies
  • 0 kudos

ANALYZE TABLE <table_name> COMPUTE STATISTICS- Data loading

Hi, I want some clarification regarding running ANALYZE TABLE <table_name> COMPUTE STATISTICS. Can anyone please help me understand if this command will throw errors or cause issues while data is loading into the table at the time of execution? Any i...

  • 1278 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Sainath368 , You can safely run ANALYZE command! Here is a detailed explanation: Concurrency Between ANALYZE TABLE and Write/Update Operations1. Delta Lake’s ACID TransactionsDelta Lake provides ACID (Atomicity, Consistency, Isolation, Durability...

  • 0 kudos
mac_delvalle
by New Contributor II
  • 1905 Views
  • 4 replies
  • 3 kudos

Resolved! Add Spark Configurations Serverless Compute

Hi everyone,We’re in the process of migrating from all-purpose clusters to serverless compute in Databricks. On our all-purpose clusters, we’ve been setting specific Spark configurations (e.g., via the cluster’s advanced options). However, we’ve noti...

Data Engineering
clusters
serverless
spark
  • 1905 Views
  • 4 replies
  • 3 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 3 kudos

I think you will not be able to set spark configurations in cluster in serverless. But you can put this in notebook.spark.conf.set( "spark.sql.sources.partitionOverwriteMode", "dynamic" ) 

  • 3 kudos
3 More Replies
weakliemg
by New Contributor II
  • 1774 Views
  • 2 replies
  • 0 kudos

databricks bundle install: Error: Maximum file size of 524288000 exceeded

I have a job that's running some ML classification models. This uses PyTorch 2.5.0. I've configured the project with that dependency. I can deploy my job to our dev system from my laptop and all goes well. When I run this off our CI/CD server, for so...

  • 1774 Views
  • 2 replies
  • 0 kudos
Latest Reply
weakliemg
New Contributor II
  • 0 kudos

Thanks but why does this behavior not happen locally? Also, the bundle config doesn't reference torch, it's used in code and included as a dev dependency in pyproject.toml. My libraries are just this: libraries: - whl: ../dist/*....

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels