cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

databrciks
by New Contributor III
  • 118 Views
  • 2 replies
  • 2 kudos

Resolved! Delta table update

Hi Experts I have around 100 table in the bronze layer (DLT pipeline). We have created silver layer based on some logic around 20 silver layer tables.How to run the specific pipeline in silver layer when ever there is some update happens in the bronz...

  • 118 Views
  • 2 replies
  • 2 kudos
Latest Reply
databrciks
New Contributor III
  • 2 kudos

Thanks @anuj_lathi  for the Detailed explanation. This helps a lot .

  • 2 kudos
1 More Replies
BabyM
by New Contributor II
  • 53 Views
  • 0 replies
  • 0 kudos

Regarding Databricks Associate voucher @Jim Anderson

Dear Databricks Community Admin @Jim AndersonI hope you are doing well.Request regarding the 50% certification coupon for the recent learning event which I had completed the learning pathsIt was mentioned that the coupon would be shared on 9th April,...

  • 53 Views
  • 0 replies
  • 0 kudos
DineshOjha
by New Contributor III
  • 545 Views
  • 6 replies
  • 3 kudos

Resolved! Service Principal access notebooks created under /Workspace/Users

What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?Hi everyone,We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a wor...

  • 545 Views
  • 6 replies
  • 3 kudos
Latest Reply
DineshOjha
New Contributor III
  • 3 kudos

Thank you so much Ashwin, this provides a lot of clarity.1. Where to deploy Bundles in the workspaceWe plan to deploy the bundle using a service principal , so the bundle we plan to deploy under /Workspace/<service_principal>1. Create notebooks under...

  • 3 kudos
5 More Replies
prakharsachan
by New Contributor
  • 135 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Database synced tables

When I am deploying synced tables and the pipelines which create the source tables(used by synced tables) using DABs for the first time, the error occurs that the source tables doesnt exist (yes because the pipeline hasnt ran yet), then whats the wor...

  • 135 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @prakharsachan, synced_database_table creation assumes the Unity Catalog source table referenced in spec.source_table_full_name already exists and is readable. The API treats this as the source table to sync from, and if it can’t be read, you’ll s...

  • 1 kudos
Steffen
by New Contributor III
  • 124 Views
  • 2 replies
  • 0 kudos

Partition optimization strategy for task that massively inflate size of dataframe

Hellowe are facing some optimization problems for a workflow that interpolates raw measurements to one second intervals.We are currently using dbl tempo for this, but had the same issues when doing an simple approach with window function.we have the ...

image.png image.png
  • 124 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @Steffen , Perhaps you can classify ids by their data frequency, e.g. low, med, and high-frequency, and process them in different jobs? This will avoid skewness.  Also as a quick test I would suggest to try to run it on serverless - it will scale ...

  • 0 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 100 Views
  • 1 replies
  • 0 kudos

Get task_run_id (or job_run_id) of a *launched* job_run task

Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...

task_run_id-poc-1.png task_run_id-poc-2.png task_run_id-poc-3.png
  • 100 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I ran into the same confusion and did some testing on this. Here's what I found: Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it. But {{tasks.par...

  • 0 kudos
SuMiT1
by New Contributor III
  • 2436 Views
  • 6 replies
  • 0 kudos

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...

  • 2436 Views
  • 6 replies
  • 0 kudos
Latest Reply
SuMiT1
New Contributor III
  • 0 kudos

Hi @AnandGNR Here is the youtube link refer this https://www.youtube.com/watch?v=6HQCZNW7XwY&t=800s

  • 0 kudos
5 More Replies
Akshatkumar69
by New Contributor
  • 105 Views
  • 2 replies
  • 1 kudos

Metric views joins

I am currently working on a migration project from power BI to ai bi dashboard in databricks . Now i am using the metric views to create all the measures and DAX queries which i have in my power BI report in YAML in the metric views but the main prob...

Akshatkumar69_0-1775806455687.png
  • 105 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Akshatkumar69, welcome to the community. You're not alone on this one, it is common with folks coming from Power BI. The key thing to understand is that AI/BI charts do expect a single data source, but that source can be a metric view that alrea...

  • 1 kudos
1 More Replies
subray
by New Contributor
  • 111 Views
  • 3 replies
  • 0 kudos

databricks-connect serverless GRPC issue

Queries executed via Databricks Connect v17 (Spark Connect / gRPC) onserverless compute COMPLETE SUCCESSFULLY on the server side (Spark tasksfinish, results are produced), but the Spark Connect gRPC channel FAILSTO DELIVER results back to the client ...

  • 111 Views
  • 3 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This is a well-known class of issue with gRPC/HTTP2 long-lived streams being killed by network intermediaries. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the ke...

  • 0 kudos
2 More Replies
ittzzmalind
by New Contributor II
  • 306 Views
  • 1 replies
  • 0 kudos

Resolved! Accessing Azure Databricks Workspace via Private Endpoint and On-Premises Proxy

Public access to the Azure Databricks workspace is currently disabled. Access is required through a Private Link (private endpoint – api_ui).A private endpoint has already been configured successfully:Virtual Network: Vnet-PE-ENDPOINTSubnet: Snet-PE-...

  • 306 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This is a classic hub-spoke + on-premises hybrid networking scenario. Here's how to architect it end-to-end. Architecture Overview The traffic flow will be: VM (VNet-App) --> ExpressRoute/VPN Gateway --> On-Prem Proxy Server --> ExpressRoute/VPN Gate...

  • 0 kudos
FAHADURREHMAN
by New Contributor III
  • 291 Views
  • 2 replies
  • 2 kudos

Resolved! DELTA Merge taking too much Time

Hi Legends, I have a timeseries DELTA table having 707.1GiB, 7702 files, 262 Billion rows. (Mainly its timeseries data). This table is clustered on 2 columns (Timestamp col & 2nd one is descriptive column)I have designed a pipeline which runs every w...

  • 291 Views
  • 2 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Great question -- slow MERGE is one of the most common Delta Lake performance issues. Here's a systematic checklist: 1. Partition Pruning in the MERGE Condition The #1 cause of slow MERGEs is missing the partition column in your ON clause. If your ta...

  • 2 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 241 Views
  • 5 replies
  • 1 kudos

Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 241 Views
  • 5 replies
  • 1 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 1 kudos

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...

  • 1 kudos
4 More Replies
shan-databricks
by Databricks Partner
  • 338 Views
  • 3 replies
  • 0 kudos

Resolved! Invoking one job from another to execute a specific task

I have multiple tasks, each working with different tables. Each table has dependencies across Bronze, Silver, and Gold layers. I want to trigger and run a specific task independently, instead of running all tasks in the job. How can I do this? Also, ...

  • 338 Views
  • 3 replies
  • 0 kudos
Latest Reply
rohan22sri
New Contributor II
  • 0 kudos

1. Go to job and left click on task you want to run .2. Click on play button(highlighted in yellow in attachment )3. This make sure that you run only 1 task at a time and not the whole job . 

  • 0 kudos
2 More Replies
kevinleindecker
by New Contributor II
  • 292 Views
  • 4 replies
  • 1 kudos

SQL Warehouse error: "Cannot read properties of undefined (reading 'data')" when querying system tab

Queries that previously worked started failing in SQL Warehouse (Dashboards) without any changes on our side.The query succeeds, but fails to render results with error:"Cannot read properties of undefined (reading 'data')"This happens with:- system.b...

  • 292 Views
  • 4 replies
  • 1 kudos
Latest Reply
Esgario
New Contributor
  • 1 kudos

Same problem here. I have previously reported this issue, and it had been resolved at the time. However, the problem has now reoccurred.When ingesting large tables (over 100k rows), the system is unable to properly render the data, preventing the tab...

  • 1 kudos
3 More Replies
Labels