cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Yuki
by New Contributor III
  • 439 Views
  • 2 replies
  • 2 kudos

How do you think continuing to use instance profile to S3 multi part upload?

My team is currently using an instance profile to upload data to S3 since we only have Hive Metastore.I like Unity Catalog a lot, but my code uses multipart upload to S3 for efficiency.https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview...

  • 439 Views
  • 2 replies
  • 2 kudos
Latest Reply
Yuki
New Contributor III
  • 2 kudos

Hi @lingareddy_Alva ,Thank you for your excellent response. I really appreciated it.I couldn't find the mention that says "Instance profiles are still supported but should be used for specific, advanced access cases." I will use it for now, recognizi...

  • 2 kudos
1 More Replies
zmsoft
by Contributor
  • 720 Views
  • 3 replies
  • 0 kudos

How to set DLT pipeline warning alert?

Hi there,The example description of custom event hooks in the documentation is not clear enough, I do not know how to implement it inside python functions. event-hooks  My Code: %python # Read the insertion of data raw_user_delta_streaming=spark.rea...

  • 720 Views
  • 3 replies
  • 0 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 0 kudos

Hi @zmsoft  The event hook provided, user_event_hook, must be a Python callable that accepts exactly one parameter - a dictionary representation of the event that triggered the execution of this event hook. The return value of the event hook has no s...

  • 0 kudos
2 More Replies
William_Scardua
by Valued Contributor
  • 482 Views
  • 1 replies
  • 0 kudos

Collecting Job Usage Metrics Without Unity Catalog

hi,I would like to request assistance on how to collect usage metrics and job execution data for my Databricks environment. We are currently not using Unity Catalog, but I would still like to monitor and analyze usageCould you please provide guidance...

  • 482 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

hi @William_Scardua Here’s a comprehensive overview of how to collect usage and job‐execution metrics in Databricks without Unity Catalog,using REST APIs, audit logs, system tables, and built-in monitoring features.In summary, you can retrieve:1. Job...

  • 0 kudos
NamNguyenCypher
by New Contributor II
  • 591 Views
  • 2 replies
  • 2 kudos

Resolved! Adding column masks to a column using the DLT Python create_streaming_table API

I'm having difficulty adding a mask function to columns while creating streaming tables using the DLT Python method create_streaming_table() like this but it does not work, the streaming table is created but no column is masked:def prepare_column_pro...

  • 591 Views
  • 2 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 2 kudos

@NamNguyenCypher Delta Live Tables’ Python API does not currently honor column-mask metadata embedded in a PySpark StructType. Masking (and row filters) on DLT tables are only applied when you define your table with a DDL-style schema that includes a...

  • 2 kudos
1 More Replies
vziog
by New Contributor II
  • 714 Views
  • 5 replies
  • 1 kudos

Unexpected SKU Names in Usage Table for Job Cost Calculation

I'm trying to calculate the cost of a job using the usage and list_prices system tables, but I'm encountering some unexpected behavior that I can't explain.When I run a job using a shared cluster, the sku_name in the usage table is PREMIUM_JOBS_SERVE...

  • 714 Views
  • 5 replies
  • 1 kudos
Latest Reply
vziog
New Contributor II
  • 1 kudos

Thank you all for your replies. @lingareddy_Alva what about @Walter_C and @mnorland mentioned about enabling serverless tasks. Is this possible and how?

  • 1 kudos
4 More Replies
santhiya
by New Contributor
  • 377 Views
  • 1 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 377 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
ankit001mittal
by New Contributor III
  • 336 Views
  • 1 replies
  • 0 kudos

DLT Publish event log to metastore

Hi Guys,I am trying to use DLT Publish event log to metastore feature.and I noticed it creates a table with the logs for each DLT pipelines separately. Does it mean it maintains the separate log table for all the DLT tables ( in our case, we have 100...

ankit001mittal_0-1745328628320.png
  • 336 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Contributor
  • 0 kudos

Hi @ankit001mittal Yes, you're right, when you enable the "Publish Event Log to Metastore" option for DLT pipelines, Databricks creates a separate event log table for each pipeline. So, if you have thousands of pipelines, you'll see thousands of log ...

  • 0 kudos
holychs
by New Contributor III
  • 354 Views
  • 2 replies
  • 0 kudos

Repairing running workflow with few failed child jobs

I have a parent job that calls multiple child jobs in workflow, Out of 10 child jobs, one has failed and rest 9 are still running, I want to repair the failed child tasks. can I do that while the other child jobs are running?

  • 354 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi holychs,How are you doing today?, As per my understanding, yes, in Databricks Workflows, if you're running a multi-task job (like your parent job triggering multiple child tasks), you can repair only the failed task without restarting the entire j...

  • 0 kudos
1 More Replies
vivi007
by New Contributor
  • 358 Views
  • 1 replies
  • 0 kudos

Can we have a depend-on for jobs to run on two different dabs?

If there are two different DABs. Can we have a dependency for one job from one DAB to run after a job run from another DAB?  Similar to how tasks can depend on each other to run one after the other in the same DAB. Can we have the same for two differ...

  • 358 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

@vivi007 Yes, you can create dependencies between jobs in different DABs (Databricks Asset Bundles), but this requires a different approach than task dependencies within a single DAB.Since DABs are designed to be independently deployable units, direc...

  • 0 kudos
ShreevaniRao
by New Contributor III
  • 6305 Views
  • 13 replies
  • 4 kudos

Newbie learning DLT pipelines

Hello,I am learning to create DLT pipelines using different graphs using a  14 day trial version of the premium Databricks. I have currently one graph Mat view -> Streaming Table -> Mat view. When i ran this pipeline(serverless compute) 1st time, ran...

  • 6305 Views
  • 13 replies
  • 4 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 4 kudos

use this https://www.youtube.com/watch?v=iqf_QHC7tgQ&list=PL2IsFZBGM_IGpBGqxhkiNyEt4AuJXA0Gg it will help you a lot

  • 4 kudos
12 More Replies
ktagseth
by New Contributor II
  • 333 Views
  • 3 replies
  • 0 kudos

dbutils.fs.mv inefficient with ADLS

dbutils.fs.mv with ADLS currently copies the file and then deletes the old one. This incurs costs and has a lot of overhead vs using the rename functionality in ADLS which is instant and doesn't incur extra costs involved with writing the 'new' data....

  • 333 Views
  • 3 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

The tool is really meant for dbfs and is only accessible from within Databricks.  If I had to guess the idea is that most folks will not be using dbfs for production or sensitive data (for a host of good reasons) and as such there has not been a big ...

  • 0 kudos
2 More Replies
fedemgp
by New Contributor
  • 597 Views
  • 1 replies
  • 0 kudos

Configure verbose audit logs through terraform

Hi everyone,I was looking into the databricks_workspace_conf Terraform resource to configure Verbose Audit Logs (and avoid changing it through the UI). However, I attempted to apply this configuration and encountered the following error:Error: cannot...

  • 597 Views
  • 1 replies
  • 0 kudos
Latest Reply
TheRealOliver
Contributor
  • 0 kudos

@fedemgp I was able to turn the desired setting on and off with Terraform with this code: GitHub Gist I'm using Databricks Terraform provider version 1.74.0 and my Databricks runs on Google Cloud.

  • 0 kudos
gm_co
by New Contributor
  • 854 Views
  • 1 replies
  • 0 kudos

Bar chart data labels in percent

Hello, I am currently working with bar visualizations in a new workbook editor. When I use labels, I can see the count of rows returned, and hovering over them shows the percentage of the two values returned. How can I make the percentage display on ...

gm_co_0-1739384118290.png
  • 854 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @gm_co! Were you able to sort this out?You can display % in two ways: In the General settings, check the box for Normalize values to percentage.As you have enabled Labels, just set the Data labels to {{ @@yPercent }}. This will show the percent...

  • 0 kudos
MrJava
by New Contributor III
  • 13804 Views
  • 17 replies
  • 13 kudos

How to know, who started a job run?

Hi there!We have different jobs/workflows configured in our Databricks workspace running on AWS and would like to know who actually started the job run? Are they started by a user or a service principle using curl?Currently one can only see, who is t...

  • 13804 Views
  • 17 replies
  • 13 kudos
Latest Reply
jeremy98
Honored Contributor
  • 13 kudos

News on this feature?

  • 13 kudos
16 More Replies
VicS
by New Contributor III
  • 933 Views
  • 6 replies
  • 3 kudos

Resolved! How can I use Terraform to assign an external location to multiple workspaces?

How can I use Terraform to assign an external location to multiple workspaces?When I create an external location with Terraform, I do not see any option to directly link workspaces. it also only links to the workspace of the databricks profile that I...

VicS_0-1744880469889.png
  • 933 Views
  • 6 replies
  • 3 kudos
Latest Reply
TheRealOliver
Contributor
  • 3 kudos

@Walter_C I think you need to use databricks_workspace_binding resource for that multi-workspace binding.  I was able to achieve it in Terraform. The resource docs seem to agree with result that I have. My Databricks runs on Google Cloud.My Terraform...

  • 3 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels