cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

abhijit007
by Databricks Partner
  • 578 Views
  • 2 replies
  • 2 kudos

Resolved! Redshift to Databricks Migration with Lakebridge

We are currently performing an assessment for a client’s Redshift to Databricks migration, and we would like to better understand the enhanced capabilities of Lakebridge for this use case.We would appreciate clarification on the following points:Scop...

  • 578 Views
  • 2 replies
  • 2 kudos
Latest Reply
pradeep_singh
Contributor III
  • 2 kudos

There is a nice course on Partner Academy as well . It uses SQL Server as a target system for migration but you can follow the same steps for Redshift as well . https://partner-academy.databricks.com/learn/courses/4326/lakebridge-for-sql-source-syste...

  • 2 kudos
1 More Replies
muaaz
by New Contributor II
  • 1394 Views
  • 5 replies
  • 1 kudos

Resolved! Registering Delta tables from external storage GCS , S3 , Azure Blob in Databricks Unity Catalog

Hi everyone,I am currently working on a migration project from Azure Databricks to GCP Databricks, and I need some guidance from the community on best practices around registering external Delta tables into Unity Catalog.Currenlty I am doing this but...

  • 1394 Views
  • 5 replies
  • 1 kudos
Latest Reply
muaaz
New Contributor II
  • 1 kudos

Hi  @Ashwin_DSA Thanks for the reply.The method you proposed sounds fine, but we are dealing with a very large volume of data around 3 schemas, ~50 tenants, and over 100 tables. Since this data is being migrated from Azure to GCP, we would prefer to ...

  • 1 kudos
4 More Replies
prakharsachan
by New Contributor III
  • 571 Views
  • 3 replies
  • 0 kudos

Resolved! Accessing secrets(secret scope) in pipeline yml file

How can I access secrets in pipeline yaml or directly in python script file?

  • 571 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @prakharsachan ,In Declarative Automation Bundles YAML (formerly known as Databricks Assets Bundles) you can only define secret scopes:If you want to read secrets from secret scope you can use dbutils in python script:password = dbutils.secrets.ge...

  • 0 kudos
2 More Replies
200649021
by New Contributor II
  • 478 Views
  • 1 replies
  • 1 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 478 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

That's cool ! why not git it ?

  • 1 kudos
ChristianRRL
by Honored Contributor
  • 1482 Views
  • 6 replies
  • 2 kudos

Resolved! Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 1482 Views
  • 6 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...

  • 2 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 622 Views
  • 2 replies
  • 2 kudos

Resolved! Get task_run_id (or job_run_id) of a *launched* job_run task

Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...

task_run_id-poc-1.png task_run_id-poc-2.png task_run_id-poc-3.png
  • 622 Views
  • 2 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, I ran into the same confusion and did some testing on this. Here's what I found: Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it. But {{tasks.par...

  • 2 kudos
1 More Replies
abhishek0306
by New Contributor
  • 720 Views
  • 4 replies
  • 0 kudos

Databricks file based trigger to sharepoint

Hi,Can we create a file based trigger from sharepoint location for excel files from databricks. So my need is to copy the excel files from sharepoint to external volumes in databricks so can it be done using a trigger that whenever the file drops in ...

  • 720 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohan22sri
New Contributor III
  • 0 kudos

File-based triggers in Databricks are designed to work with data that already resides in cloud storage (such as ADLS, S3, or GCS). In this case, since the source system is SharePoint, expecting a native file-based trigger from Databricks is not feasi...

  • 0 kudos
3 More Replies
Akshatkumar69
by New Contributor II
  • 1242 Views
  • 3 replies
  • 1 kudos

Resolved! Metric views joins

I am currently working on a migration project from power BI to ai bi dashboard in databricks . Now i am using the metric views to create all the measures and DAX queries which i have in my power BI report in YAML in the metric views but the main prob...

Akshatkumar69_0-1775806455687.png
  • 1242 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Akshatkumar69, welcome to the community. You're not alone on this one, it is common with folks coming from Power BI. The key thing to understand is that AI/BI charts do expect a single data source, but that source can be a metric view that alrea...

  • 1 kudos
2 More Replies
databrciks
by New Contributor III
  • 580 Views
  • 2 replies
  • 2 kudos

Resolved! Delta table update

Hi Experts I have around 100 table in the bronze layer (DLT pipeline). We have created silver layer based on some logic around 20 silver layer tables.How to run the specific pipeline in silver layer when ever there is some update happens in the bronz...

  • 580 Views
  • 2 replies
  • 2 kudos
Latest Reply
databrciks
New Contributor III
  • 2 kudos

Thanks @anuj_lathi  for the Detailed explanation. This helps a lot .

  • 2 kudos
1 More Replies
IM_01
by Contributor III
  • 1211 Views
  • 6 replies
  • 2 kudos

Resolved! Lakeflow SDP expectations

Hi, Is there a way to get number of warned records, dropped records , failed records for each expectation I see currently it gives aggregated count

  • 1211 Views
  • 6 replies
  • 2 kudos
Latest Reply
IM_01
Contributor III
  • 2 kudos

Thanks Ashwin 

  • 2 kudos
5 More Replies
DineshOjha
by New Contributor III
  • 1406 Views
  • 6 replies
  • 3 kudos

Resolved! Service Principal access notebooks created under /Workspace/Users

What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?Hi everyone,We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a wor...

  • 1406 Views
  • 6 replies
  • 3 kudos
Latest Reply
DineshOjha
New Contributor III
  • 3 kudos

Thank you so much Ashwin, this provides a lot of clarity.1. Where to deploy Bundles in the workspaceWe plan to deploy the bundle using a service principal , so the bundle we plan to deploy under /Workspace/<service_principal>1. Create notebooks under...

  • 3 kudos
5 More Replies
prakharsachan
by New Contributor III
  • 478 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Database synced tables

When I am deploying synced tables and the pipelines which create the source tables(used by synced tables) using DABs for the first time, the error occurs that the source tables doesnt exist (yes because the pipeline hasnt ran yet), then whats the wor...

  • 478 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @prakharsachan, synced_database_table creation assumes the Unity Catalog source table referenced in spec.source_table_full_name already exists and is readable. The API treats this as the source table to sync from, and if it can’t be read, you’ll s...

  • 1 kudos
Steffen
by New Contributor III
  • 514 Views
  • 2 replies
  • 0 kudos

Partition optimization strategy for task that massively inflate size of dataframe

Hellowe are facing some optimization problems for a workflow that interpolates raw measurements to one second intervals.We are currently using dbl tempo for this, but had the same issues when doing an simple approach with window function.we have the ...

image.png image.png
  • 514 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @Steffen , Perhaps you can classify ids by their data frequency, e.g. low, med, and high-frequency, and process them in different jobs? This will avoid skewness.  Also as a quick test I would suggest to try to run it on serverless - it will scale ...

  • 0 kudos
1 More Replies
SuMiT1
by New Contributor III
  • 3275 Views
  • 6 replies
  • 0 kudos

Unable to Create Secret Scope in Databricks – “Fetch request failed due to expired user session”

I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...

  • 3275 Views
  • 6 replies
  • 0 kudos
Latest Reply
SuMiT1
New Contributor III
  • 0 kudos

Hi @AnandGNR Here is the youtube link refer this https://www.youtube.com/watch?v=6HQCZNW7XwY&t=800s

  • 0 kudos
5 More Replies
subray
by New Contributor II
  • 832 Views
  • 3 replies
  • 0 kudos

Resolved! databricks-connect serverless GRPC issue

Queries executed via Databricks Connect v17 (Spark Connect / gRPC) onserverless compute COMPLETE SUCCESSFULLY on the server side (Spark tasksfinish, results are produced), but the Spark Connect gRPC channel FAILSTO DELIVER results back to the client ...

  • 832 Views
  • 3 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This is a well-known class of issue with gRPC/HTTP2 long-lived streams being killed by network intermediaries. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the ke...

  • 0 kudos
2 More Replies
Labels