cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tsam
by New Contributor
  • 96 Views
  • 2 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 96 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @tsam , Can you share few details: Which DBR is the job on?How many DEEP CLONEs you need to run in total?What is the parallelism of the for-each task?Are the cloned tables optimized (e.g. there is no "small file problem")?Can you share the Heap Hi...

  • 0 kudos
1 More Replies
yit
by Databricks Partner
  • 1090 Views
  • 2 replies
  • 1 kudos

How to implement MERGE operations in Lakeflow Declarative Pipelines

Hey everyone,We’ve been using Autoloader extensively for a while, and now we’re looking to transition to full Lakeflow Declarative Pipelines. From what I’ve researched, the reader part seems straightforward and clear.For the writer, I understand that...

  • 1090 Views
  • 2 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

Hi @yit Lakeflow supports upsert/merge semantics natively for Delta tables unlile ForEachBatchInstead of writing custom forEachBatch code, you declare the merge keys and update logic in your pipeline configuration.Lakeflow will automatically generate...

  • 1 kudos
1 More Replies
databrick_enthu
by New Contributor
  • 86 Views
  • 2 replies
  • 0 kudos

Cannot create streaming table

Hi,while trying to create a streaming table in sql notebook, i am getting below error. please can you assist to fix it.The operation CREATE is not allowed: Cannot CREATE the Streaming Table `my_catalog`.`test_schema`.`emp` in Serverless Generic Compu...

  • 86 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @databrick_enthu , The error is unfortunately misleading. The Materialized View / Streaming Table on Serverless Generic Compute is not yet available for a Preview.  For now, you have two possibilities: Attach the notebook to a Serverless SQL Wareh...

  • 0 kudos
1 More Replies
yatharth
by New Contributor III
  • 64 Views
  • 2 replies
  • 0 kudos

Resolved! Bug Report: Incorrect “Next Run Time” Calculation for Interval Periodic Schedules

SummaryWhen configuring a Jobs schedule → Interval periodic trigger in Databricks, the “Next run” timestamp changes inconsistently when the interval value is modified, even though the base schedule (start time) remains the same. The next run appears ...

  • 64 Views
  • 2 replies
  • 0 kudos
Latest Reply
yatharth
New Contributor III
  • 0 kudos

Thanks @Ashwin_DSA  for that detailed answer, that surely solves my query but I see this as a classic UX bug disguised as feature

  • 0 kudos
1 More Replies
Danish11052000
by Contributor
  • 52 Views
  • 1 replies
  • 1 kudos

Resolved! Missing workspaces in workspaces_latest but present in audit

While validating workspace coverage, we observed that some workspace_id exist in system.access.audit but return NULL in system.access.workspaces_latest.  select distinct a.workspace_id, w.workspace_name from system.access.audit a left join system.acc...

  • 52 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hey @Danish11052000, Yes... This is expected and documented behaviour, not a bug. system.access.workspaces_latest contains only active workspaces in the account. When a workspace is cancelled/removed from the account, its row is removed from this tab...

  • 1 kudos
SuShuang
by New Contributor II
  • 936 Views
  • 11 replies
  • 2 kudos

Resolved! What has happened to syntax coloring for SQL queries???

What has happened to syntax coloring for SQL queries??? It seems that everything is in color blue which is confusing and hard to read the code...

  • 936 Views
  • 11 replies
  • 2 kudos
Latest Reply
SuShuang
New Contributor II
  • 2 kudos

Hello, any news in this topic?

  • 2 kudos
10 More Replies
abhijit007
by Databricks Partner
  • 93 Views
  • 2 replies
  • 2 kudos

Redshift to Databricks Migration with Lakebridge

We are currently performing an assessment for a client’s Redshift to Databricks migration, and we would like to better understand the enhanced capabilities of Lakebridge for this use case.We would appreciate clarification on the following points:Scop...

  • 93 Views
  • 2 replies
  • 2 kudos
Latest Reply
pradeep_singh
Contributor III
  • 2 kudos

There is a nice course on Partner Academy as well . It uses SQL Server as a target system for migration but you can follow the same steps for Redshift as well . https://partner-academy.databricks.com/learn/courses/4326/lakebridge-for-sql-source-syste...

  • 2 kudos
1 More Replies
IM_01
by Contributor III
  • 147 Views
  • 6 replies
  • 0 kudos
  • 147 Views
  • 6 replies
  • 0 kudos
Latest Reply
IM_01
Contributor III
  • 0 kudos

Hi @Ashwin_DSA Thanks for the example . So it abstracts the logic of metrics can we make the group by cols dynamic actually the scenario is dynamically based on filters like - all or specific cols selection calculate metrics. So is it good to go with...

  • 0 kudos
5 More Replies
AlexSantiago
by New Contributor II
  • 17000 Views
  • 22 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 17000 Views
  • 22 replies
  • 4 kudos
Latest Reply
poja
New Contributor II
  • 4 kudos

By including such informative and discussion-based content, your site becomes a valuable resource for developers, tech enthusiasts, and learners who are looking for reliable solutions, ultimately improving engagement and strengthening your authority ...

  • 4 kudos
21 More Replies
muaaz
by New Contributor
  • 181 Views
  • 5 replies
  • 1 kudos

Resolved! Registering Delta tables from external storage GCS , S3 , Azure Blob in Databricks Unity Catalog

Hi everyone,I am currently working on a migration project from Azure Databricks to GCP Databricks, and I need some guidance from the community on best practices around registering external Delta tables into Unity Catalog.Currenlty I am doing this but...

  • 181 Views
  • 5 replies
  • 1 kudos
Latest Reply
muaaz
New Contributor
  • 1 kudos

Hi  @Ashwin_DSA Thanks for the reply.The method you proposed sounds fine, but we are dealing with a very large volume of data around 3 schemas, ~50 tenants, and over 100 tables. Since this data is being migrated from Azure to GCP, we would prefer to ...

  • 1 kudos
4 More Replies
prakharsachan
by New Contributor
  • 102 Views
  • 3 replies
  • 0 kudos

Accessing secrets(secret scope) in pipeline yml file

How can I access secrets in pipeline yaml or directly in python script file?

  • 102 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @prakharsachan ,In Declarative Automation Bundles YAML (formerly known as Databricks Assets Bundles) you can only define secret scopes:If you want to read secrets from secret scope you can use dbutils in python script:password = dbutils.secrets.ge...

  • 0 kudos
2 More Replies
200649021
by New Contributor II
  • 328 Views
  • 1 replies
  • 1 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 328 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor
  • 1 kudos

That's cool ! why not git it ?

  • 1 kudos
prakharsachan
by New Contributor
  • 76 Views
  • 1 replies
  • 1 kudos

pipeline config DAB

I am deploying DLT pipeline in dev environment using DABs. source code is in a python script file. In the pipeline's yml file the configuration key is set to true(with all correct indentations), yet the pipeline isnt deploying in the continuous mode....

  • 76 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @prakharsachan ,Continuous must be set inside the pipeline resource definition, not under configuration.The configuration block in a SDP (former DLT) pipeline definition is for Spark/pipeline settings (key-value string pairs passed to the runtime)...

  • 1 kudos
ChristianRRL
by Honored Contributor
  • 445 Views
  • 6 replies
  • 2 kudos

Resolved! Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 445 Views
  • 6 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...

  • 2 kudos
5 More Replies
Labels