cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

William_Scardua
by Valued Contributor
  • 13 Views
  • 0 replies
  • 0 kudos

Collecting Job Usage Metrics Without Unity Catalog

hi,I would like to request assistance on how to collect usage metrics and job execution data for my Databricks environment. We are currently not using Unity Catalog, but I would still like to monitor and analyze usageCould you please provide guidance...

  • 13 Views
  • 0 replies
  • 0 kudos
NamNguyenCypher
by New Contributor II
  • 27 Views
  • 2 replies
  • 1 kudos

Adding column masks to a column using the DLT Python create_streaming_table API

I'm having difficulty adding a mask function to columns while creating streaming tables using the DLT Python method create_streaming_table() like this but it does not work, the streaming table is created but no column is masked:def prepare_column_pro...

  • 27 Views
  • 2 replies
  • 1 kudos
Latest Reply
LRALVA
Contributor
  • 1 kudos

@NamNguyenCypher Delta Live Tables’ Python API does not currently honor column-mask metadata embedded in a PySpark StructType. Masking (and row filters) on DLT tables are only applied when you define your table with a DDL-style schema that includes a...

  • 1 kudos
1 More Replies
LasseL
by New Contributor III
  • 1234 Views
  • 3 replies
  • 0 kudos

How to use change data feed when schema is changing between delta table versions?

How to use change data feed when delta table schema changes between delta table versions?I tried to read change data feed in parts (in code snippet I read version 1372, because 1371 and 1373 schema versions are different), but getting errorUnsupporte...

  • 1234 Views
  • 3 replies
  • 0 kudos
Latest Reply
LRALVA
Contributor
  • 0 kudos

@LasseL When you read from the change data feed in batch mode, Delta Lake always uses a single schema:By default, it uses the latest table version’s schema, even if you’re only reading an older versionOn Delta Runtime ≥ 12.2 LTS with column mapping e...

  • 0 kudos
2 More Replies
cookiebaker
by Visitor
  • 117 Views
  • 3 replies
  • 3 kudos

Some DLTpipelines suddely seem to take different runtime 16.1 instead of 15.4 since last night (CET)

Hello, Suddenly since last night on some of our DLT pipelines we're getting failures saying that our hive_metastore control table cannot be found. All of our DLT's are set up the same (serverless), and one Shared Compute on runtime version 15.4. For ...

  • 117 Views
  • 3 replies
  • 3 kudos
Latest Reply
JeanSeb
Visitor
  • 3 kudos

any update on this one? we're also experiencing the same kind of issue since yesterday evening, with serverless DLT referring to hive metastore tables

  • 3 kudos
2 More Replies
Christian_C
by New Contributor
  • 331 Views
  • 5 replies
  • 0 kudos

Google Pub Sub and Delta live table

I am using delta live table and pub sub to ingest message from 30 different topics in parallel. I noticed that initialization time can be very long around 15 minutes. Does someone knows how to reduced initialization time in dlt ? Thanks You 

  • 331 Views
  • 5 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

Classic clusters can take up to seven minutes to be acquired, configured, and deployed, with most of this time spent waiting for the cloud service to allocate virtual machines. In contrast, serverless clusters typically start in under eight seconds. ...

  • 0 kudos
4 More Replies
vziog
by New Contributor
  • 77 Views
  • 5 replies
  • 1 kudos

Unexpected SKU Names in Usage Table for Job Cost Calculation

I'm trying to calculate the cost of a job using the usage and list_prices system tables, but I'm encountering some unexpected behavior that I can't explain.When I run a job using a shared cluster, the sku_name in the usage table is PREMIUM_JOBS_SERVE...

  • 77 Views
  • 5 replies
  • 1 kudos
Latest Reply
vziog
New Contributor
  • 1 kudos

Thank you all for your replies. @LRALVA what about @Walter_C and @mnorland mentioned about enabling serverless tasks. Is this possible and how?

  • 1 kudos
4 More Replies
santhiya
by Visitor
  • 32 Views
  • 1 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
ankit001mittal
by New Contributor III
  • 39 Views
  • 1 replies
  • 0 kudos

DLT Publish event log to metastore

Hi Guys,I am trying to use DLT Publish event log to metastore feature.and I noticed it creates a table with the logs for each DLT pipelines separately. Does it mean it maintains the separate log table for all the DLT tables ( in our case, we have 100...

ankit001mittal_0-1745328628320.png
  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
New Contributor II
  • 0 kudos

Hi @ankit001mittal Yes, you're right, when you enable the "Publish Event Log to Metastore" option for DLT pipelines, Databricks creates a separate event log table for each pipeline. So, if you have thousands of pipelines, you'll see thousands of log ...

  • 0 kudos
holychs
by New Contributor III
  • 141 Views
  • 2 replies
  • 0 kudos

Repairing running workflow with few failed child jobs

I have a parent job that calls multiple child jobs in workflow, Out of 10 child jobs, one has failed and rest 9 are still running, I want to repair the failed child tasks. can I do that while the other child jobs are running?

  • 141 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi holychs,How are you doing today?, As per my understanding, yes, in Databricks Workflows, if you're running a multi-task job (like your parent job triggering multiple child tasks), you can repair only the failed task without restarting the entire j...

  • 0 kudos
1 More Replies
vivi007
by New Contributor
  • 58 Views
  • 1 replies
  • 0 kudos

Can we have a depend-on for jobs to run on two different dabs?

If there are two different DABs. Can we have a dependency for one job from one DAB to run after a job run from another DAB?  Similar to how tasks can depend on each other to run one after the other in the same DAB. Can we have the same for two differ...

  • 58 Views
  • 1 replies
  • 0 kudos
Latest Reply
LRALVA
Contributor
  • 0 kudos

@vivi007 Yes, you can create dependencies between jobs in different DABs (Databricks Asset Bundles), but this requires a different approach than task dependencies within a single DAB.Since DABs are designed to be independently deployable units, direc...

  • 0 kudos
drii_cavalcanti
by New Contributor III
  • 40 Views
  • 0 replies
  • 0 kudos

Databricks App with DAB

Hi All,I am trying to deploy a DBX APP via DAB, however source_code_path seems not to be parsed correctly to the app configuration.- dbx_dash/-- resources/---- app.yml-- src/---- app.yaml---- app.py-- databricks.ymlresources/app.yml:resources:apps: m...

  • 40 Views
  • 0 replies
  • 0 kudos
ShreevaniRao
by New Contributor II
  • 4708 Views
  • 13 replies
  • 4 kudos

Newbie learning DLT pipelines

Hello,I am learning to create DLT pipelines using different graphs using a  14 day trial version of the premium Databricks. I have currently one graph Mat view -> Streaming Table -> Mat view. When i ran this pipeline(serverless compute) 1st time, ran...

  • 4708 Views
  • 13 replies
  • 4 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 4 kudos

use this https://www.youtube.com/watch?v=iqf_QHC7tgQ&list=PL2IsFZBGM_IGpBGqxhkiNyEt4AuJXA0Gg it will help you a lot

  • 4 kudos
12 More Replies
ktagseth
by New Contributor
  • 110 Views
  • 3 replies
  • 0 kudos

dbutils.fs.mv inefficient with ADLS

dbutils.fs.mv with ADLS currently copies the file and then deletes the old one. This incurs costs and has a lot of overhead vs using the rename functionality in ADLS which is instant and doesn't incur extra costs involved with writing the 'new' data....

  • 110 Views
  • 3 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

The tool is really meant for dbfs and is only accessible from within Databricks.  If I had to guess the idea is that most folks will not be using dbfs for production or sensitive data (for a host of good reasons) and as such there has not been a big ...

  • 0 kudos
2 More Replies
fedemgp
by New Contributor
  • 340 Views
  • 1 replies
  • 0 kudos

Configure verbose audit logs through terraform

Hi everyone,I was looking into the databricks_workspace_conf Terraform resource to configure Verbose Audit Logs (and avoid changing it through the UI). However, I attempted to apply this configuration and encountered the following error:Error: cannot...

  • 340 Views
  • 1 replies
  • 0 kudos
Latest Reply
TheRealOliver
New Contributor III
  • 0 kudos

@fedemgp I was able to turn the desired setting on and off with Terraform with this code: GitHub Gist I'm using Databricks Terraform provider version 1.74.0 and my Databricks runs on Google Cloud.

  • 0 kudos
ankit001mittal
by New Contributor III
  • 57 Views
  • 1 replies
  • 0 kudos

DLT Query History

Hi guys,I can see that in DLT pipelines we have query history section where we can see the duration of each tables and number of rows read.Is this information stored somewhere in the systems catalog? can I query this information?

Screenshot 2025-04-22 145638.png
  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

This might be available in the query history system table https://docs.databricks.com/aws/en/admin/system-tables/query-history 

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels