Data Engineering

Forum Posts

Sorted by:

by William_Scardua • Valued Contributor

2 hours ago

13 Views
0 replies
0 kudos

Collecting Job Usage Metrics Without Unity Catalog

hi,I would like to request assistance on how to collect usage metrics and job execution data for my Databricks environment. We are currently not using Unity Catalog, but I would still like to monitor and analyze usageCould you please provide guidance...

Data Engineering

13 Views
0 replies
0 kudos

2 hours ago

by NamNguyenCypher • New Contributor II

4 hours ago

27 Views
2 replies
1 kudos

Adding column masks to a column using the DLT Python create_streaming_table API

I'm having difficulty adding a mask function to columns while creating streaming tables using the DLT Python method create_streaming_table() like this but it does not work, the streaming table is created but no column is masked:def prepare_column_pro...

Data Engineering

27 Views
2 replies
1 kudos

4 hours ago

View Replies

Latest Reply

LRALVA
Contributor

4 hours ago

1 kudos

@NamNguyenCypher Delta Live Tables’ Python API does not currently honor column-mask metadata embedded in a PySpark StructType. Masking (and row filters) on DLT tables are only applied when you define your table with a DDL-style schema that includes a...

1 kudos

4 hours ago

1 More Replies

by LasseL • New Contributor III

06-04-2024 4:52:57 AM

1234 Views
3 replies
0 kudos

How to use change data feed when schema is changing between delta table versions?

How to use change data feed when delta table schema changes between delta table versions?I tried to read change data feed in parts (in code snippet I read version 1372, because 1371 and 1373 schema versions are different), but getting errorUnsupporte...

Data Engineering

1234 Views
3 replies
0 kudos

06-04-2024 4:52:57 AM

View Replies

Latest Reply

LRALVA
Contributor

3 hours ago

0 kudos

@LasseL When you read from the change data feed in batch mode, Delta Lake always uses a single schema:By default, it uses the latest table version’s schema, even if you’re only reading an older versionOn Delta Runtime ≥ 12.2 LTS with column mapping e...

0 kudos

3 hours ago

2 More Replies

by cookiebaker • Visitor

14 hours ago

117 Views
3 replies
3 kudos

Some DLTpipelines suddely seem to take different runtime 16.1 instead of 15.4 since last night (CET)

Hello, Suddenly since last night on some of our DLT pipelines we're getting failures saying that our hive_metastore control table cannot be found. All of our DLT's are set up the same (serverless), and one Shared Compute on runtime version 15.4. For ...

Data Engineering

117 Views
3 replies
3 kudos

14 hours ago

View Replies

Latest Reply

JeanSeb
Visitor

3 hours ago

3 kudos

any update on this one? we're also experiencing the same kind of issue since yesterday evening, with serverless DLT referring to hive metastore tables

3 kudos

3 hours ago

2 More Replies

by Christian_C • New Contributor

Thursday

331 Views
5 replies
0 kudos

Google Pub Sub and Delta live table

I am using delta live table and pub sub to ingest message from 30 different topics in parallel. I noticed that initialization time can be very long around 15 minutes. Does someone knows how to reduced initialization time in dlt ? Thanks You

Data Engineering

331 Views
5 replies
0 kudos

Thursday

View Replies

Latest Reply

BigRoux
Databricks Employee

5 hours ago

0 kudos

Classic clusters can take up to seven minutes to be acquired, configured, and deployed, with most of this time spent waiting for the cloud service to allocate virtual machines. In contrast, serverless clusters typically start in under eight seconds. ...

0 kudos

5 hours ago

4 More Replies

by vziog • New Contributor

yesterday

77 Views
5 replies
1 kudos

Unexpected SKU Names in Usage Table for Job Cost Calculation

I'm trying to calculate the cost of a job using the usage and list_prices system tables, but I'm encountering some unexpected behavior that I can't explain.When I run a job using a shared cluster, the sku_name in the usage table is PREMIUM_JOBS_SERVE...

Data Engineering

77 Views
5 replies
1 kudos

yesterday

View Replies

Latest Reply

vziog
New Contributor

10 hours ago

1 kudos

Thank you all for your replies. @LRALVA what about @Walter_C and @mnorland mentioned about enabling serverless tasks. Is this possible and how?

1 kudos

10 hours ago

4 More Replies

by santhiya • Visitor

10 hours ago

32 Views
1 replies
0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

Data Engineering

32 Views
1 replies
0 kudos

10 hours ago

View Replies

Latest Reply

BigRoux
Databricks Employee

7 hours ago

0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

0 kudos

7 hours ago

by ankit001mittal • New Contributor III

yesterday

39 Views
1 replies
0 kudos

DLT Publish event log to metastore

Hi Guys,I am trying to use DLT Publish event log to metastore feature.and I noticed it creates a table with the logs for each DLT pipelines separately. Does it mean it maintains the separate log table for all the DLT tables ( in our case, we have 100...

Data Engineering

dlt

39 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

SP_6721
New Contributor II

8 hours ago

0 kudos

Hi @ankit001mittal Yes, you're right, when you enable the "Publish Event Log to Metastore" option for DLT pipelines, Databricks creates a separate event log table for each pipeline. So, if you have thousands of pipelines, you'll see thousands of log ...

0 kudos

8 hours ago

by holychs • New Contributor III

2 weeks ago

141 Views
2 replies
0 kudos

Repairing running workflow with few failed child jobs

I have a parent job that calls multiple child jobs in workflow, Out of 10 child jobs, one has failed and rest 9 are still running, I want to repair the failed child tasks. can I do that while the other child jobs are running?

Data Engineering

141 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Brahmareddy
Honored Contributor III

2 weeks ago

0 kudos

Hi holychs,How are you doing today?, As per my understanding, yes, in Databricks Workflows, if you're running a multi-task job (like your parent job triggering multiple child tasks), you can repair only the failed task without restarting the entire j...

0 kudos

2 weeks ago

1 More Replies

by vivi007 • New Contributor

yesterday

58 Views
1 replies
0 kudos

Can we have a depend-on for jobs to run on two different dabs?

If there are two different DABs. Can we have a dependency for one job from one DAB to run after a job run from another DAB? Similar to how tasks can depend on each other to run one after the other in the same DAB. Can we have the same for two differ...

Data Engineering

DAB

58 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

LRALVA
Contributor

yesterday

0 kudos

@vivi007 Yes, you can create dependencies between jobs in different DABs (Databricks Asset Bundles), but this requires a different approach than task dependencies within a single DAB.Since DABs are designed to be independently deployable units, direc...

0 kudos

yesterday

by drii_cavalcanti • New Contributor III

yesterday

40 Views
0 replies
0 kudos

Databricks App with DAB

Hi All,I am trying to deploy a DBX APP via DAB, however source_code_path seems not to be parsed correctly to the app configuration.- dbx_dash/-- resources/---- app.yml-- src/---- app.yaml---- app.py-- databricks.ymlresources/app.yml:resources:apps: m...

Data Engineering

40 Views
0 replies
0 kudos

yesterday

by ShreevaniRao • New Contributor II

a week ago

4708 Views
13 replies
4 kudos

Newbie learning DLT pipelines

Hello,I am learning to create DLT pipelines using different graphs using a 14 day trial version of the premium Databricks. I have currently one graph Mat view -> Streaming Table -> Mat view. When i ran this pipeline(serverless compute) 1st time, ran...

Data Engineering

4708 Views
13 replies
4 kudos

a week ago

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

a week ago

4 kudos

use this https://www.youtube.com/watch?v=iqf_QHC7tgQ&list=PL2IsFZBGM_IGpBGqxhkiNyEt4AuJXA0Gg it will help you a lot

4 kudos

a week ago

12 More Replies

by ktagseth • New Contributor

yesterday

110 Views
3 replies
0 kudos

dbutils.fs.mv inefficient with ADLS

dbutils.fs.mv with ADLS currently copies the file and then deletes the old one. This incurs costs and has a lot of overhead vs using the rename functionality in ADLS which is instant and doesn't incur extra costs involved with writing the 'new' data....

Data Engineering

110 Views
3 replies
0 kudos

yesterday

View Replies

Latest Reply

BigRoux
Databricks Employee

yesterday

0 kudos

The tool is really meant for dbfs and is only accessible from within Databricks. If I had to guess the idea is that most folks will not be using dbfs for production or sensitive data (for a host of good reasons) and as such there has not been a big ...

0 kudos

yesterday

2 More Replies

by fedemgp • New Contributor

4 weeks ago

340 Views
1 replies
0 kudos

Configure verbose audit logs through terraform

Hi everyone,I was looking into the databricks_workspace_conf Terraform resource to configure Verbose Audit Logs (and avoid changing it through the UI). However, I attempted to apply this configuration and encountered the following error:Error: cannot...

Data Engineering

340 Views
1 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

TheRealOliver
New Contributor III

yesterday

0 kudos

@fedemgp I was able to turn the desired setting on and off with Terraform with this code: GitHub Gist I'm using Databricks Terraform provider version 1.74.0 and my Databricks runs on Google Cloud.

0 kudos

yesterday

by ankit001mittal • New Contributor III

yesterday

57 Views
1 replies
0 kudos

DLT Query History

Hi guys,I can see that in DLT pipelines we have query history section where we can see the duration of each tables and number of rows read.Is this information stored somewhere in the systems catalog? can I query this information?

Data Engineering

dlt

57 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Walter_C
Databricks Employee

yesterday

0 kudos

This might be available in the query history system table https://docs.databricks.com/aws/en/admin/system-tables/query-history

0 kudos

yesterday

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

Collecting Job Usage Metrics Without Unity Catalog

Adding column masks to a column using the DLT Python create_streaming_table API

How to use change data feed when schema is changing between delta table versions?

Some DLTpipelines suddely seem to take different runtime 16.1 instead of 15.4 since last night (CET)

Google Pub Sub and Delta live table

Unexpected SKU Names in Usage Table for Job Cost Calculation

CPU usage and idle time metrics from system tables

DLT Publish event log to metastore

Repairing running workflow with few failed child jobs

Can we have a depend-on for jobs to run on two different dabs?

Databricks App with DAB

Newbie learning DLT pipelines

dbutils.fs.mv inefficient with ADLS

Configure verbose audit logs through terraform

DLT Query History

Join Us as a Local Community Builder!

Continuous workflow job creating new job clusters?

Databricks External table row maximum size

DAB | Set tag based on job parameter

How can I use Terraform to assign an external loca...

global temp view issue