cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yatharth
by New Contributor III
  • 47 Views
  • 2 replies
  • 0 kudos

Resolved! Bug Report: Incorrect “Next Run Time” Calculation for Interval Periodic Schedules

SummaryWhen configuring a Jobs schedule → Interval periodic trigger in Databricks, the “Next run” timestamp changes inconsistently when the interval value is modified, even though the base schedule (start time) remains the same. The next run appears ...

  • 47 Views
  • 2 replies
  • 0 kudos
Latest Reply
yatharth
New Contributor III
  • 0 kudos

Thanks @Ashwin_DSA  for that detailed answer, that surely solves my query but I see this as a classic UX bug disguised as feature

  • 0 kudos
1 More Replies
Danish11052000
by Contributor
  • 38 Views
  • 1 replies
  • 1 kudos

Resolved! Missing workspaces in workspaces_latest but present in audit

While validating workspace coverage, we observed that some workspace_id exist in system.access.audit but return NULL in system.access.workspaces_latest.  select distinct a.workspace_id, w.workspace_name from system.access.audit a left join system.acc...

  • 38 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hey @Danish11052000, Yes... This is expected and documented behaviour, not a bug. system.access.workspaces_latest contains only active workspaces in the account. When a workspace is cancelled/removed from the account, its row is removed from this tab...

  • 1 kudos
SuShuang
by New Contributor II
  • 929 Views
  • 11 replies
  • 2 kudos

Resolved! What has happened to syntax coloring for SQL queries???

What has happened to syntax coloring for SQL queries??? It seems that everything is in color blue which is confusing and hard to read the code...

  • 929 Views
  • 11 replies
  • 2 kudos
Latest Reply
SuShuang
New Contributor II
  • 2 kudos

Hello, any news in this topic?

  • 2 kudos
10 More Replies
abhijit007
by Databricks Partner
  • 83 Views
  • 2 replies
  • 2 kudos

Redshift to Databricks Migration with Lakebridge

We are currently performing an assessment for a client’s Redshift to Databricks migration, and we would like to better understand the enhanced capabilities of Lakebridge for this use case.We would appreciate clarification on the following points:Scop...

  • 83 Views
  • 2 replies
  • 2 kudos
Latest Reply
pradeep_singh
Contributor III
  • 2 kudos

There is a nice course on Partner Academy as well . It uses SQL Server as a target system for migration but you can follow the same steps for Redshift as well . https://partner-academy.databricks.com/learn/courses/4326/lakebridge-for-sql-source-syste...

  • 2 kudos
1 More Replies
IM_01
by Contributor III
  • 135 Views
  • 6 replies
  • 0 kudos
  • 135 Views
  • 6 replies
  • 0 kudos
Latest Reply
IM_01
Contributor III
  • 0 kudos

Hi @Ashwin_DSA Thanks for the example . So it abstracts the logic of metrics can we make the group by cols dynamic actually the scenario is dynamically based on filters like - all or specific cols selection calculate metrics. So is it good to go with...

  • 0 kudos
5 More Replies
databrick_enthu
by New Contributor
  • 77 Views
  • 1 replies
  • 0 kudos

Cannot create streaming table

Hi,while trying to create a streaming table in sql notebook, i am getting below error. please can you assist to fix it.The operation CREATE is not allowed: Cannot CREATE the Streaming Table `my_catalog`.`test_schema`.`emp` in Serverless Generic Compu...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
NageshPatil
New Contributor III
  • 0 kudos

The error occurs because your Databricks workspace uses Serverless Generic Compute, which requires a specific preview feature for creating streaming tables or materialized views. To resolve this, you must enroll in the "Serverless Generic Compute Mat...

  • 0 kudos
AlexSantiago
by New Contributor II
  • 16998 Views
  • 22 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 16998 Views
  • 22 replies
  • 4 kudos
Latest Reply
poja
New Contributor
  • 4 kudos

By including such informative and discussion-based content, your site becomes a valuable resource for developers, tech enthusiasts, and learners who are looking for reliable solutions, ultimately improving engagement and strengthening your authority ...

  • 4 kudos
21 More Replies
muaaz
by New Contributor
  • 174 Views
  • 5 replies
  • 1 kudos

Resolved! Registering Delta tables from external storage GCS , S3 , Azure Blob in Databricks Unity Catalog

Hi everyone,I am currently working on a migration project from Azure Databricks to GCP Databricks, and I need some guidance from the community on best practices around registering external Delta tables into Unity Catalog.Currenlty I am doing this but...

  • 174 Views
  • 5 replies
  • 1 kudos
Latest Reply
muaaz
New Contributor
  • 1 kudos

Hi  @Ashwin_DSA Thanks for the reply.The method you proposed sounds fine, but we are dealing with a very large volume of data around 3 schemas, ~50 tenants, and over 100 tables. Since this data is being migrated from Azure to GCP, we would prefer to ...

  • 1 kudos
4 More Replies
prakharsachan
by New Contributor
  • 92 Views
  • 3 replies
  • 0 kudos

Accessing secrets(secret scope) in pipeline yml file

How can I access secrets in pipeline yaml or directly in python script file?

  • 92 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @prakharsachan ,In Declarative Automation Bundles YAML (formerly known as Databricks Assets Bundles) you can only define secret scopes:If you want to read secrets from secret scope you can use dbutils in python script:password = dbutils.secrets.ge...

  • 0 kudos
2 More Replies
200649021
by New Contributor II
  • 327 Views
  • 1 replies
  • 1 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 327 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor
  • 1 kudos

That's cool ! why not git it ?

  • 1 kudos
prakharsachan
by New Contributor
  • 74 Views
  • 1 replies
  • 1 kudos

pipeline config DAB

I am deploying DLT pipeline in dev environment using DABs. source code is in a python script file. In the pipeline's yml file the configuration key is set to true(with all correct indentations), yet the pipeline isnt deploying in the continuous mode....

  • 74 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @prakharsachan ,Continuous must be set inside the pipeline resource definition, not under configuration.The configuration block in a SDP (former DLT) pipeline definition is for Spark/pipeline settings (key-value string pairs passed to the runtime)...

  • 1 kudos
tsam
by New Contributor
  • 89 Views
  • 1 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 89 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @tsam ,I think your problem might be caused by the fact that each call "CREATE OR REPLACE TABLE ... DEEP CLONE" accumulates state on the driver even though you're not collecting data.The main culprits are:1. Spark Plan / Query Plan Caching Every S...

  • 0 kudos
ChristianRRL
by Honored Contributor
  • 436 Views
  • 6 replies
  • 2 kudos

Resolved! Get task_run_id that is nested in a job_run task

Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...

  • 436 Views
  • 6 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks. Your instinct about the REST API was correct. Here's the fix: Solutio...

  • 2 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 190 Views
  • 2 replies
  • 2 kudos

Resolved! Get task_run_id (or job_run_id) of a *launched* job_run task

Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...

task_run_id-poc-1.png task_run_id-poc-2.png task_run_id-poc-3.png
  • 190 Views
  • 2 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, I ran into the same confusion and did some testing on this. Here's what I found: Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it. But {{tasks.par...

  • 2 kudos
1 More Replies
Labels