cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

joeyslaptop
by New Contributor II
  • 11104 Views
  • 6 replies
  • 3 kudos

How to add a column to a new table containing the original source filenames in DataBricks.

If this isn't the right spot to post this, please move it or refer me to the right area.I recently learned about the "_metadata.file_name".  It's not quite what I need.I'm creating a new table in DataBricks and want to add a USR_File_Name column cont...

Data Engineering
Databricks
filename
import
SharePoint
Upload
  • 11104 Views
  • 6 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi, Could you please elaborate more on the expectation here? 

  • 3 kudos
5 More Replies
allyallen
by New Contributor III
  • 5028 Views
  • 5 replies
  • 0 kudos

Resolved! Variable Compute clusters within a Job

We have 3 possible compute clusters that we can run a notebook against.They are varying sizes and the one that the notebook uses will depend on the size of the data being processed.We "t-shirt size" each tenant base on their data size (S, M, L) and c...

  • 5028 Views
  • 5 replies
  • 0 kudos
Latest Reply
allyallen
New Contributor III
  • 0 kudos

Hi @eniwoke That's a great solution thank you so much!Our process is now as follows:NB1 gets the tenant t-shirt size and sets the cluster_id for each size as a variable.The notebook then loops through each tenant and using the DataBricks API updates ...

  • 0 kudos
4 More Replies
Steffen
by New Contributor III
  • 4369 Views
  • 4 replies
  • 1 kudos

Resolved! DictionaryFilters Pushdown on Views

HelloI have a very simple table with time series data with three columns:id (long): unique id of signalts (unix timestamp): timestamp of the event in unix timestamp formatvalue (double): value of the signal at the given timestampFor every second ther...

  • 4369 Views
  • 4 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Steffen , This happens because you're applying some functions to ts attribute like FLOOR, from_unix_timestamp etc., which hides the raw ts from Spark's optimizer, so it can’t push down filters.If you can, try to add additional attribute to your u...

  • 1 kudos
3 More Replies
ShankarM
by Databricks Partner
  • 3743 Views
  • 3 replies
  • 0 kudos

DBR version 10.4 impact

hi,For one of our projects which is in production we are using DBR 10.4 for which EOL was Mar 18th, 2025.I wanted to know will there any impact to existing workloads which are running in production. Is yes then can you let me know the impact and risk...

  • 3743 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hello @ShankarM Actually, there is no official End of Life (EoL) date provided by Databricks. If you check the documentation I referenced in my previous message, EoL is the next phase after End of Support (EoS), but Databricks does not announce a spe...

  • 0 kudos
2 More Replies
om_bk_00
by New Contributor III
  • 3739 Views
  • 5 replies
  • 1 kudos

Resolved! passing job parameters through the terminal to a job

I am having troubles overriding the job parameters that are deployed in my local workspace.e.g I have a job that fills tables with data,the parameters given to it are random and I would like to override them when I run through my terminaldatabricks b...

  • 3739 Views
  • 5 replies
  • 1 kudos
Latest Reply
EduardoSB
New Contributor II
  • 1 kudos

Hi! I just found this post because I'm having troubles trying to pass custom values to some parameters in my jobs. I guess databricks bundle run <job_name> --python-params "--param1=value1,--param2=value2,..."should work, shouldn't it? Is any other e...

  • 1 kudos
4 More Replies
adhi_databricks
by Contributor
  • 6855 Views
  • 7 replies
  • 1 kudos

Resolved! Requirement to run a databricks job from another job based on custom conditions using DAB

Hi everyone,I'm using Databricks Asset Bundles to deploy a job that includes a run_job_task, which requires a job_id to trigger another job.For different targets (dev, staging, prod), I need to pass different job_ids dynamically. To achieve this, I’v...

  • 6855 Views
  • 7 replies
  • 1 kudos
Latest Reply
adhi_databricks
Contributor
  • 1 kudos

Hey folks, Thanks for the help hereWas able to solve this issue with updating the databricks cli to latest versionThanks once again!!

  • 1 kudos
6 More Replies
liu
by Databricks Partner
  • 4015 Views
  • 2 replies
  • 1 kudos

Resolved! I encountered an error when trying to use dbutils to operate on files with a file: prefix.

When I execute the statement:dbutils.fs.ls("file:/tmp/")I receive the following error:ExecutionError: (java.lang.SecurityException) Cannot use com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem - local filesystem access is forbiddenDoes an...

  • 4015 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @liu ,Which type of cluster are you using? Which access mode? Your compute must have Dedicated (formerly single user) access mode 

  • 1 kudos
1 More Replies
noorbasha534
by Valued Contributor II
  • 1345 Views
  • 5 replies
  • 0 kudos

DQ anomaly detection : _quality_monitoring_summary table DDL

DearsDoes anyone have the DDL for _quality_monitoring_summary table?This is created by the DQ anomaly detector. Since the detector was trying to create a managed table which is not allowed in the environment I work, I am attempting to create this on ...

  • 1345 Views
  • 5 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 0 kudos

Hi,The _quality_monitoring_summary table is an internal table created by the Data Quality Anomaly Detector in Databricks Lakehouse Monitoring. Unfortunately, the full DDL is not publicly documented in detail, and trying to manually create it can lead...

  • 0 kudos
4 More Replies
ismaelhenzel
by Valued Contributor
  • 9684 Views
  • 4 replies
  • 11 kudos

Resolved! DELTA LIVE TABLES - MATERIALIZED VIEW DOES NOT INCREMENT NOTHING !

I'm very disappointed with this framework. The documentation is inadequate, and it has many limitations. I want to run materialized views with incremental updates, but DLT insists on performing a full recompute. Why is it doing this? Here is the log ...

  • 9684 Views
  • 4 replies
  • 11 kudos
Latest Reply
1ct0
New Contributor II
  • 11 kudos

I'm seeing a subtype of EXCESSIVE_OPERATOR_NESTING that is preventing incremental updates. Is there any documentation so that this these issues can attempt to be resolved? 

  • 11 kudos
3 More Replies
manish1987c
by New Contributor III
  • 3810 Views
  • 6 replies
  • 1 kudos

Delta Live Table - Flow detected an update or delete to one or more rows in the source table

I have create a pipeline where i am ingesting the data from bronze to silver and using SCD 1, however when i am trying to create gold table as dlt it is giving me error as "Flow 'user_silver' has FAILED fatally. An error occurred because we detected ...

manish1987c_0-1718341166099.png manish1987c_1-1718341206991.png
  • 3810 Views
  • 6 replies
  • 1 kudos
Latest Reply
Pat
Esteemed Contributor
  • 1 kudos

Streaming tables in Delta Live Tables (DLT) only support append-only operations in the SOURCE.The error occurs because:1. Your silver table uses SCD Type 1, which performs UPDATE and DELETE operations on existing records2. Your gold table is defined ...

  • 1 kudos
5 More Replies
ShivangiB1
by New Contributor III
  • 3757 Views
  • 3 replies
  • 0 kudos

Embed Databricks AI/BI dashboard in external website and validate using service principal

Hey Team,I tried embedding my AI/BI databricks dashboard in sharepoint and it worked.But i dont want to validate using my credential, can i use service principal to validate.

  • 3757 Views
  • 3 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @ShivangiB1! You can publish the dashboard using a Service Principal via API, which allows you to embed it in SharePoint without requiring individual user logins.For more details, please refer to the documentation here: https://docs.databricks....

  • 0 kudos
2 More Replies
drag7ter
by Contributor
  • 5006 Views
  • 7 replies
  • 0 kudos

Disable ssl for federated connection on Amazon Redshift

Here is a doc how to set up connection and foreign catalog, but there is no any mentions how to disable ssl for the connection.https://docs.databricks.com/en/query-federation/redshift.htmlWhen I set up connection and foreign catalog I get this error,...

  • 5006 Views
  • 7 replies
  • 0 kudos
Latest Reply
system_is_down
New Contributor II
  • 0 kudos

Hey @Alberto_Umana just checking in on this again. Anything new on this? I've tried creating catalogs and connections via UI, REST API, and CLI as well but none have worked to disable SSL. The documentation references this ability here: https://docs....

  • 0 kudos
6 More Replies
Sainath368
by Contributor
  • 1565 Views
  • 1 replies
  • 0 kudos

ANALYZE TABLE <table_name> COMPUTE STATISTICS- Data loading

Hi, I want some clarification regarding running ANALYZE TABLE <table_name> COMPUTE STATISTICS. Can anyone please help me understand if this command will throw errors or cause issues while data is loading into the table at the time of execution? Any i...

  • 1565 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Sainath368 , You can safely run ANALYZE command! Here is a detailed explanation: Concurrency Between ANALYZE TABLE and Write/Update Operations1. Delta Lake’s ACID TransactionsDelta Lake provides ACID (Atomicity, Consistency, Isolation, Durability...

  • 0 kudos
mac_delvalle
by New Contributor II
  • 3604 Views
  • 4 replies
  • 3 kudos

Resolved! Add Spark Configurations Serverless Compute

Hi everyone,We’re in the process of migrating from all-purpose clusters to serverless compute in Databricks. On our all-purpose clusters, we’ve been setting specific Spark configurations (e.g., via the cluster’s advanced options). However, we’ve noti...

Data Engineering
clusters
serverless
spark
  • 3604 Views
  • 4 replies
  • 3 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 3 kudos

I think you will not be able to set spark configurations in cluster in serverless. But you can put this in notebook.spark.conf.set( "spark.sql.sources.partitionOverwriteMode", "dynamic" ) 

  • 3 kudos
3 More Replies
weakliemg
by New Contributor II
  • 2355 Views
  • 2 replies
  • 0 kudos

databricks bundle install: Error: Maximum file size of 524288000 exceeded

I have a job that's running some ML classification models. This uses PyTorch 2.5.0. I've configured the project with that dependency. I can deploy my job to our dev system from my laptop and all goes well. When I run this off our CI/CD server, for so...

  • 2355 Views
  • 2 replies
  • 0 kudos
Latest Reply
weakliemg
New Contributor II
  • 0 kudos

Thanks but why does this behavior not happen locally? Also, the bundle config doesn't reference torch, it's used in code and included as a dev dependency in pyproject.toml. My libraries are just this: libraries: - whl: ../dist/*....

  • 0 kudos
1 More Replies
Labels