cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

databrciks
by New Contributor III
  • 52 Views
  • 1 replies
  • 0 kudos

Delta table update

Hi Experts I have around 100 table in the bronze layer (DLT pipeline). We have created silver layer based on some logic around 20 silver layer tables.How to run the specific pipeline in silver layer when ever there is some update happens in the bronz...

  • 52 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — great question! This is a common pattern when you have a large medallion architecture with many bronze-to-silver dependencies. There are several approaches you can take, ranging from simple to more advanced. ——— Option 1: Single DLT Pipeline wit...

  • 0 kudos
prakharsachan
by Visitor
  • 40 Views
  • 0 replies
  • 0 kudos

Databricks Database synced tables

When I am deploying synced tables and the pipelines which create the source tables(used by synced tables) using DABs for the first time, the error occurs that the source tables doesnt exist (yes because the pipeline hasnt ran yet), then whats the wor...

  • 40 Views
  • 0 replies
  • 0 kudos
Steffen
by New Contributor III
  • 98 Views
  • 2 replies
  • 0 kudos

Partition optimization strategy for task that massively inflate size of dataframe

Hellowe are facing some optimization problems for a workflow that interpolates raw measurements to one second intervals.We are currently using dbl tempo for this, but had the same issues when doing an simple approach with window function.we have the ...

image.png image.png
  • 98 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @Steffen , Perhaps you can classify ids by their data frequency, e.g. low, med, and high-frequency, and process them in different jobs? This will avoid skewness.  Also as a quick test I would suggest to try to run it on serverless - it will scale ...

  • 0 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 66 Views
  • 1 replies
  • 0 kudos

Get task_run_id (or job_run_id) of a *launched* job_run task

Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...

task_run_id-poc-1.png task_run_id-poc-2.png task_run_id-poc-3.png
  • 66 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I ran into the same confusion and did some testing on this. Here's what I found: Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it. But {{tasks.par...

  • 0 kudos
SuMiT1
by New Contributor III
  • 2377 Views
  • 6 replies
  • 0 kudos

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...

  • 2377 Views
  • 6 replies
  • 0 kudos
Latest Reply
SuMiT1
New Contributor III
  • 0 kudos

Hi @AnandGNR Here is the youtube link refer this https://www.youtube.com/watch?v=6HQCZNW7XwY&t=800s

  • 0 kudos
5 More Replies
Akshatkumar69
by Visitor
  • 68 Views
  • 2 replies
  • 0 kudos

Metric views joins

I am currently working on a migration project from power BI to ai bi dashboard in databricks . Now i am using the metric views to create all the measures and DAX queries which i have in my power BI report in YAML in the metric views but the main prob...

Akshatkumar69_0-1775806455687.png
  • 68 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @Akshatkumar69, welcome to the community. You're not alone on this one, it is common with folks coming from Power BI. The key thing to understand is that AI/BI charts do expect a single data source, but that source can be a metric view that alrea...

  • 0 kudos
1 More Replies
subray
by New Contributor
  • 81 Views
  • 3 replies
  • 0 kudos

databricks-connect serverless GRPC issue

Queries executed via Databricks Connect v17 (Spark Connect / gRPC) onserverless compute COMPLETE SUCCESSFULLY on the server side (Spark tasksfinish, results are produced), but the Spark Connect gRPC channel FAILSTO DELIVER results back to the client ...

  • 81 Views
  • 3 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This is a well-known class of issue with gRPC/HTTP2 long-lived streams being killed by network intermediaries. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the ke...

  • 0 kudos
2 More Replies
ittzzmalind
by New Contributor II
  • 161 Views
  • 1 replies
  • 0 kudos

Resolved! Accessing Azure Databricks Workspace via Private Endpoint and On-Premises Proxy

Public access to the Azure Databricks workspace is currently disabled. Access is required through a Private Link (private endpoint – api_ui).A private endpoint has already been configured successfully:Virtual Network: Vnet-PE-ENDPOINTSubnet: Snet-PE-...

  • 161 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This is a classic hub-spoke + on-premises hybrid networking scenario. Here's how to architect it end-to-end. Architecture Overview The traffic flow will be: VM (VNet-App) --> ExpressRoute/VPN Gateway --> On-Prem Proxy Server --> ExpressRoute/VPN Gate...

  • 0 kudos
FAHADURREHMAN
by New Contributor III
  • 128 Views
  • 2 replies
  • 2 kudos

Resolved! DELTA Merge taking too much Time

Hi Legends, I have a timeseries DELTA table having 707.1GiB, 7702 files, 262 Billion rows. (Mainly its timeseries data). This table is clustered on 2 columns (Timestamp col & 2nd one is descriptive column)I have designed a pipeline which runs every w...

  • 128 Views
  • 2 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Great question -- slow MERGE is one of the most common Delta Lake performance issues. Here's a systematic checklist: 1. Partition Pruning in the MERGE Condition The #1 cause of slow MERGEs is missing the partition column in your ON clause. If your ta...

  • 2 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 193 Views
  • 5 replies
  • 1 kudos

Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 193 Views
  • 5 replies
  • 1 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 1 kudos

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...

  • 1 kudos
4 More Replies
shan-databricks
by Databricks Partner
  • 193 Views
  • 3 replies
  • 0 kudos

Resolved! Invoking one job from another to execute a specific task

I have multiple tasks, each working with different tables. Each table has dependencies across Bronze, Silver, and Gold layers. I want to trigger and run a specific task independently, instead of running all tasks in the job. How can I do this? Also, ...

  • 193 Views
  • 3 replies
  • 0 kudos
Latest Reply
rohan22sri
New Contributor II
  • 0 kudos

1. Go to job and left click on task you want to run .2. Click on play button(highlighted in yellow in attachment )3. This make sure that you run only 1 task at a time and not the whole job . 

  • 0 kudos
2 More Replies
kevinleindecker
by New Contributor II
  • 283 Views
  • 4 replies
  • 1 kudos

SQL Warehouse error: "Cannot read properties of undefined (reading 'data')" when querying system tab

Queries that previously worked started failing in SQL Warehouse (Dashboards) without any changes on our side.The query succeeds, but fails to render results with error:"Cannot read properties of undefined (reading 'data')"This happens with:- system.b...

  • 283 Views
  • 4 replies
  • 1 kudos
Latest Reply
Esgario
New Contributor
  • 1 kudos

Same problem here. I have previously reported this issue, and it had been resolved at the time. However, the problem has now reoccurred.When ingesting large tables (over 100k rows), the system is unable to properly render the data, preventing the tab...

  • 1 kudos
3 More Replies
AanchalSoni
by Databricks Partner
  • 280 Views
  • 7 replies
  • 6 kudos

Resolved! Primary key constraint not working

I've created a Lakeflow job to run 5 notebook tasks, one for each silver table- Customers, Accounts, Transactions, Loans and Branches.In Customers notebook, after writing the data to delta table using auto loader, I'm applying the non null and primar...

  • 280 Views
  • 7 replies
  • 6 kudos
Latest Reply
balajij8
Contributor
  • 6 kudos

@AanchalSoni Capturing the columns as Primary key helps users and tools understand relationships in the data. You can create Primary Key with RELY for optimization in some cases by skipping redundant operations.Distinct EliminationWhen you apply a DI...

  • 6 kudos
6 More Replies
mjedy78
by New Contributor II
  • 2128 Views
  • 4 replies
  • 1 kudos

Transition from partitioned table to Liquid clustered table

Hi all,I have a table called classes, which is already partitioned on three different columns. I want to create a Liquid Clustered Table, but as far as I understand from the documentation—and from Dany Lee and his team—it was not possible as of 2024 ...

  • 2128 Views
  • 4 replies
  • 1 kudos
Latest Reply
biancaorita
New Contributor II
  • 1 kudos

Is there a plan to implement a way to migrate to liquid clustering for an existing table that has traditional partitioning and that is quite large (over 4 TB)? Re-creating such tables from scratch is not always ideal.

  • 1 kudos
3 More Replies
AnandGNR
by New Contributor III
  • 251 Views
  • 7 replies
  • 2 kudos

Unable to create secret scope -"Fetch request failed due expired user session"

Hi everyone,I’m trying to create an Azure Key Vault-backed secret scope in a Databricks Premium workspace, but I keep getting this error: Fetch request failed due expired user sessionSetup details:Databricks workspace: PremiumAzure Key Vault: Owner p...

  • 251 Views
  • 7 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @AnandGNR ,Try to do following. Go to your KeyVault, then in Firewalls and virtual networks set:"Allow trusted Microsoft services to bypass this firewall."

  • 2 kudos
6 More Replies
Labels