cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianRRL
by Honored Contributor
  • 79 Views
  • 2 replies
  • 0 kudos

Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 79 Views
  • 2 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — good question. The cleanest way to do this is with task values, no REST API needed. Approach: Task Values (Recommended) In Child 1's notebook, capture its own run_id and set it as a task value: import json   ctx = json.loads(     dbutils.noteboo...

  • 0 kudos
1 More Replies
Steffen
by New Contributor III
  • 18 Views
  • 1 replies
  • 0 kudos

Partition optimization strategy for task that massively inflate size of dataframe

Hellowe are facing some optimization problems for a workflow that interpolates raw measurements to one second intervals.We are currently using dbl tempo for this, but had the same issues when doing an simple approach with window function.we have the ...

image.png image.png
  • 18 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor
  • 0 kudos

 Hi, Key Points belowTime Window Chunking - Avoid interpolating a full week data in a single Spark action. Split the workload into daily or 12 hour slices. This caps maximum memory pressure, enables parallel execution and simplifies failure recovery....

  • 0 kudos
norbitek
by New Contributor II
  • 19 Views
  • 1 replies
  • 0 kudos

variant_explode_outer stop working after the last DBX runtime patch

Hi All,I import following JSON to delta table into VARIANT column:{ "data": [ { "group": 1, "manager": "no", "firstname": "John", "lastname": "Smith", "active": "false", ...

  • 19 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi,  I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this.  In the meantime you can use the following workaround: I reproduced it on DBR 18.0 using readStream + cloudFil...

  • 0 kudos
abhishek0306
by Visitor
  • 42 Views
  • 2 replies
  • 0 kudos

Databricks file based trigger to sharepoint

Hi,Can we create a file based trigger from sharepoint location for excel files from databricks. So my need is to copy the excel files from sharepoint to external volumes in databricks so can it be done using a trigger that whenever the file drops in ...

  • 42 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor
  • 0 kudos

@abhishek0306 SharePoint does not natively support the event notifications required for Databricks File Arrival Triggers. You can use belowAzure Logic Apps - Create a workflow with "When a file is created in a folder" SharePoint trigger. The workflow...

  • 0 kudos
1 More Replies
mordex
by New Contributor III
  • 138 Views
  • 2 replies
  • 0 kudos

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

  Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

  • 138 Views
  • 2 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

You're right that job clusters are the wrong fit here. The cold start time (including serverless, which is still 25-50s) makes anything under 5 minutes impractical when the cluster terminates between runs. The simplest approach: all-purpose cluster +...

  • 0 kudos
1 More Replies
holychs
by Databricks Partner
  • 369 Views
  • 2 replies
  • 0 kudos

Resolved! Run failed with error message Cluster was terminated. Reason: JOB_FINISHED (SUCCESS)

I am running a notebook through workflow using all purpose cluster("data_security_mode": "USER_ISOLATION"). I am seeing some strange behaviour with the cluster during the run. While the job is still running cluster gets terminated with the Reason: Re...

Data Engineering
clusterds
clusters
jobs
Workflows
  • 369 Views
  • 2 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — the JOB_FINISHED (SUCCESS) termination reason is the key clue here. It means another job that was using the same all-purpose cluster finished, and its completion triggered the cluster termination — taking your still-running job down with it. Mos...

  • 0 kudos
1 More Replies
vamsi_simbus
by Databricks Partner
  • 134 Views
  • 2 replies
  • 1 kudos

Resolved! Drill-down support in Databricks SQL (Lakeview) Dashboards

Hi All,Does Databricks SQL (Lakeview) Dashboards support native drill-down functionality (for example: Category → Subcategory → SKU)?Currently, we see support for cross-filtering, parameters, and drill-through within the same dataset, but hierarchica...

  • 134 Views
  • 2 replies
  • 1 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 1 kudos

Hi — good question. You're right that Lakeview doesn't have native hierarchical drill-down (click Category → auto-expand to Subcategory → SKU). But you can get fairly close by combining the features you mentioned. Here are the practical patterns: 1. ...

  • 1 kudos
1 More Replies
Phani1
by Databricks MVP
  • 167 Views
  • 3 replies
  • 0 kudos

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

 Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

  • 167 Views
  • 3 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Here is my action plan if it helps! Phase 1: Foundation ☐ Migrate to UC managed tables (if not already) ☐ Enable Predictive Optimization at catalog level ☐ Set delta.deletedFileRetentionDuration per layer Phase 2: Retention Policies ☐ Enab...

  • 0 kudos
2 More Replies
fdubourdeau
by New Contributor
  • 66 Views
  • 1 replies
  • 0 kudos

Resolved! Querying CDF on a Delta-Sharing table after data type change in the Table (INT to DECIMAL)

Hi,I am trying to query the CDF of a Delta-Sharing table that have had a change in data type of one its columns. The change was from an INT to a DECIMAL. When reading the specific version where the schema change happened, I am receiving an error ment...

  • 66 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — this is a known limitation of Change Data Feed. Here's what's happening and your options. Why This Happens Changing a column from INT to DECIMAL is a non-additive schema change. When reading CDF in batch mode, Delta Lake applies a single schema ...

  • 0 kudos
GvReddy
by Visitor
  • 52 Views
  • 1 replies
  • 0 kudos

Resolved! Guidance on App Deployment in Databricks Public Marketplace

Hello Team,Hope you are doing well.I am currently learning Databricks and have developed an application in my local workspace under a Databricks Partner account, where I also have Marketplace Admin access. However, I am unsure about the process of pu...

  • 52 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — great question! Here's what you need to know. Key Thing to Know First Currently, Databricks Apps (Streamlit, Dash, Gradio, etc.) listed on the Marketplace are first-party Databricks-owned apps only. External/partner app publishing is not yet sup...

  • 0 kudos
SuMiT1
by New Contributor III
  • 2287 Views
  • 4 replies
  • 0 kudos

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...

  • 2287 Views
  • 4 replies
  • 0 kudos
Latest Reply
AnandGNR
New Contributor II
  • 0 kudos

HI @SuMiT1 :I’m facing the exact same issue. Were you able to figure out the root cause? I’d appreciate any pointers to resolve this! I  

  • 0 kudos
3 More Replies
NW1000
by New Contributor III
  • 391 Views
  • 6 replies
  • 0 kudos

Shorten Classic Cluster start up time

We use R notebooks to generate workflow. Thus we have to use classic clusters. And we need roughly 10 additional R packages in addition to 2 pyPI packages. It takes at least 10-20 min to start the cluster. We found the most time taken were the packag...

  • 391 Views
  • 6 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hi @NW1000 , Glad you tried my suggestion, and thanks for sharing the details. 1. Why the init script failed This message: Init script failure: Cluster scoped init script ... failed: Script exit status is non-zero really just means that something ins...

  • 0 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 161 Views
  • 4 replies
  • 3 kudos

Resolved! Passing Parameters *between* Workflow run_job steps

Hi there, I'm trying to reference a task value - let's call it `output_path` (not known until programmatically generated by the code) - that is created in a nested task (Child 1) within a run_job (Parent 1) as an input parameter - let's call it `inpu...

  • 161 Views
  • 4 replies
  • 3 kudos
Latest Reply
ChristianRRL
Honored Contributor
  • 3 kudos

Quick update, my question effectively boils down to:Do databricks workflows have "global" variables that can be set programmatically from anywhere in the workflow (e.g. nested notebook task inside a parent run_job task) during runtime and be referenc...

  • 3 kudos
3 More Replies
Labels