cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

muaaz
by Visitor
  • 69 Views
  • 2 replies
  • 1 kudos

Registering Delta tables from external storage GCS , S3 , Azure Blob in Databricks Unity Catalog

Hi everyone,I am currently working on a migration project from Azure Databricks to GCP Databricks, and I need some guidance from the community on best practices around registering external Delta tables into Unity Catalog.Currenlty I am doing this but...

  • 69 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @muaaz, On GCP Databricks, the SQL pattern you are using is fine, but the recommended best practice is to back it with a Unity Catalog external location instead of pointing tables directly at arbitrary gs:// paths. In practice, that means first cr...

  • 1 kudos
1 More Replies
prakharsachan
by New Contributor
  • 43 Views
  • 2 replies
  • 0 kudos

Accessing secrets(secret scope) in pipeline yml file

How can I access secrets in pipeline yaml or directly in python script file?

  • 43 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @prakharsachan ,In Declarative Automation Bundles YAML (formerly known as Databricks Assets Bundles) you can only define secret scopes:If you want to read secrets from secret scope you can use dbutils in python script:password = dbutils.secrets.ge...

  • 0 kudos
1 More Replies
200649021
by New Contributor II
  • 303 Views
  • 1 replies
  • 1 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 303 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
  • 1 kudos

That's cool ! why not git it ?

  • 1 kudos
prakharsachan
by New Contributor
  • 56 Views
  • 1 replies
  • 1 kudos

pipeline config DAB

I am deploying DLT pipeline in dev environment using DABs. source code is in a python script file. In the pipeline's yml file the configuration key is set to true(with all correct indentations), yet the pipeline isnt deploying in the continuous mode....

  • 56 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @prakharsachan ,Continuous must be set inside the pipeline resource definition, not under configuration.The configuration block in a SDP (former DLT) pipeline definition is for Spark/pipeline settings (key-value string pairs passed to the runtime)...

  • 1 kudos
tsam
by Visitor
  • 54 Views
  • 1 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 54 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @tsam ,I think your problem might be caused by the fact that each call "CREATE OR REPLACE TABLE ... DEEP CLONE" accumulates state on the driver even though you're not collecting data.The main culprits are:1. Spark Plan / Query Plan Caching Every S...

  • 0 kudos
ChristianRRL
by Honored Contributor
  • 300 Views
  • 6 replies
  • 2 kudos

Resolved! Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 300 Views
  • 6 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...

  • 2 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 142 Views
  • 2 replies
  • 2 kudos

Resolved! Get task_run_id (or job_run_id) of a *launched* job_run task

Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...

task_run_id-poc-1.png task_run_id-poc-2.png task_run_id-poc-3.png
  • 142 Views
  • 2 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, I ran into the same confusion and did some testing on this. Here's what I found: Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it. But {{tasks.par...

  • 2 kudos
1 More Replies
abhishek0306
by New Contributor
  • 156 Views
  • 4 replies
  • 0 kudos

Databricks file based trigger to sharepoint

Hi,Can we create a file based trigger from sharepoint location for excel files from databricks. So my need is to copy the excel files from sharepoint to external volumes in databricks so can it be done using a trigger that whenever the file drops in ...

  • 156 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohan22sri
New Contributor II
  • 0 kudos

File-based triggers in Databricks are designed to work with data that already resides in cloud storage (such as ADLS, S3, or GCS). In this case, since the source system is SharePoint, expecting a native file-based trigger from Databricks is not feasi...

  • 0 kudos
3 More Replies
Akshatkumar69
by New Contributor
  • 168 Views
  • 3 replies
  • 1 kudos

Resolved! Metric views joins

I am currently working on a migration project from power BI to ai bi dashboard in databricks . Now i am using the metric views to create all the measures and DAX queries which i have in my power BI report in YAML in the metric views but the main prob...

Akshatkumar69_0-1775806455687.png
  • 168 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Akshatkumar69, welcome to the community. You're not alone on this one, it is common with folks coming from Power BI. The key thing to understand is that AI/BI charts do expect a single data source, but that source can be a metric view that alrea...

  • 1 kudos
2 More Replies
BabyM
by New Contributor II
  • 77 Views
  • 1 replies
  • 0 kudos

Regarding Databricks Associate voucher @Jim Anderson

Dear Databricks Community Admin @Jim AndersonI hope you are doing well.Request regarding the 50% certification coupon for the recent learning event which I had completed the learning pathsIt was mentioned that the coupon would be shared on 9th April,...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Honored Contributor II
  • 0 kudos

@BabyM  Please raise a ticket at https://help.databricks.com/s/contact-us

  • 0 kudos
databrciks
by New Contributor III
  • 161 Views
  • 2 replies
  • 2 kudos

Resolved! Delta table update

Hi Experts I have around 100 table in the bronze layer (DLT pipeline). We have created silver layer based on some logic around 20 silver layer tables.How to run the specific pipeline in silver layer when ever there is some update happens in the bronz...

  • 161 Views
  • 2 replies
  • 2 kudos
Latest Reply
databrciks
New Contributor III
  • 2 kudos

Thanks @anuj_lathi  for the Detailed explanation. This helps a lot .

  • 2 kudos
1 More Replies
DineshOjha
by New Contributor III
  • 580 Views
  • 6 replies
  • 3 kudos

Resolved! Service Principal access notebooks created under /Workspace/Users

What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?Hi everyone,We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a wor...

  • 580 Views
  • 6 replies
  • 3 kudos
Latest Reply
DineshOjha
New Contributor III
  • 3 kudos

Thank you so much Ashwin, this provides a lot of clarity.1. Where to deploy Bundles in the workspaceWe plan to deploy the bundle using a service principal , so the bundle we plan to deploy under /Workspace/<service_principal>1. Create notebooks under...

  • 3 kudos
5 More Replies
prakharsachan
by New Contributor
  • 173 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Database synced tables

When I am deploying synced tables and the pipelines which create the source tables(used by synced tables) using DABs for the first time, the error occurs that the source tables doesnt exist (yes because the pipeline hasnt ran yet), then whats the wor...

  • 173 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @prakharsachan, synced_database_table creation assumes the Unity Catalog source table referenced in spec.source_table_full_name already exists and is readable. The API treats this as the source table to sync from, and if it can’t be read, you’ll s...

  • 1 kudos
Labels