Data Engineering

Forum Posts

Sorted by:

by Naveenkumar1811 • New Contributor II

yesterday

35 Views
2 replies
0 kudos

SkipChangeCommit to True Scenario on Data Loss Possibility

Hi Team,I have Below Scenario,I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a D...

Data Engineering

35 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

0 kudos

It shouldn't. You have append only stream and SkipChangeCommit will ignore any modification that were applied to already existing files

0 kudos

yesterday

1 More Replies

by Han_bbb • New Contributor

yesterday

38 Views
1 replies
0 kudos

Need to restore my scripts from the legacy version

Dear support team,The last time I used databricks was back in 2024 and I have several scripts stored in it. I really need to get access to them now but I can't login in with message "User is not a member of this workspace." Please help. Thanks

Data Engineering

38 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

0 kudos

Hi @Han_bbb ,Could you provide more details? Is it workspace in your work or your private one? Which cloud provider?

0 kudos

yesterday

by Naveenkumar1811 • New Contributor II

yesterday

37 Views
4 replies
0 kudos

How do i Create a workspace object with SP ownership

Hi Team,I have a scenario that i have a jar file(24MB) to be put on workspace directory. But the ownership should be associated to the SP with any Individual ID ownership. Tried the Databricks CLI export option but it has limitation of 10 MB max.Plea...

Data Engineering

37 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

yesterday

0 kudos

Reference Link - https://docs.databricks.com/aws/en/volumes/volume-files#upload-files-to-a-volume

0 kudos

yesterday

3 More Replies

by Naveenkumar1811 • New Contributor II

Thursday

68 Views
3 replies
0 kudos

Can we Change the ownership of Databricks Managed Secret to SP in Azure Data Bricks?

Hi Team,Earlier we faced an Issue where the jar file(Created by a old employee) in workspace directory is used as library in the cluster which is run from a SP. Since the employee left the org and the id got removed even though the SP is part of ADMI...

Data Engineering

68 Views
3 replies
0 kudos

Thursday

View Replies

Latest Reply

Coffee77
Contributor III

yesterday

0 kudos

I think there is no other way. In any case, here is how I usually configure my (all-purpose and jobs compute) clusters to access secrets via environment variables so that you don't have to update all references if some similar issue arises again. The...

0 kudos

yesterday

2 More Replies

by DarioB • New Contributor III

Friday

51 Views
1 replies
1 kudos

Resolved! Issues recreating Tables with enableRowTracking and DBR16.4 and below

We are running a Deep Clone script to copy Catalogs between Environments; this script is run through a job (run by SP) with DBR 16.4.12.Some tables are Deep Cloned and other ones are Dropped and Recreated to load partial data. The ones dropped are re...

Data Engineering

51 Views
1 replies
1 kudos

Friday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

1 kudos

Happy Monday @DarioB , I did some digging and would like to provide you with some helpful hints/tips. Thanks for the detailed context—this is a known rough edge in DBR 16.x when recreating tables that have row tracking materialized. What’s happening ...

1 kudos

yesterday

by Volker • Contributor

10-16-2024 6:54:36 AM

4961 Views
2 replies
0 kudos

Structured Streaming schemaTrackingLocation does not work with starting_version

Hello Community,I came across a strange behviour when using structured streaming on top of a delta table. I have a stream that I wanted to start from a specific version of a delta table using the option option("starting_version", x) because I did no...

Data Engineering

Delta Lake

schemaTrackingLocation

starting_version

structured streaming

4961 Views
2 replies
0 kudos

10-16-2024 6:54:36 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

This issue is related to how Delta Lake’s structured streaming interacts with schema evolution and options like startingVersion and schemaTrackingLocation. The behavior you've observed has been noted by other users, and can be subtle due to how check...

0 kudos

yesterday

1 More Replies

by stevenayers-bge • Contributor

10-27-2024 11:57:03 PM

4124 Views
2 replies
1 kudos

Querying Unity Managed Tables from Redshift

I built a script about 6 months ago to make our Delta Tables accessible in Redshift for another team, but it's a bit nasty...Generate a delta lake manifest each time the databricks delta table is updatedRecreate the redshift external table (incase th...

Data Engineering

4124 Views
2 replies
1 kudos

10-27-2024 11:57:03 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

1 kudos

There is indeed a better and more integrated way to make Delta Lake tables accessible in Redshift without manually generating manifests and dynamically creating external tables or partitions. Some important points and options: Databricks Delta Lake ...

1 kudos

yesterday

1 More Replies

by Mangeysh • New Contributor

08-08-2024 7:44:59 AM

3752 Views
2 replies
0 kudos

Azure data bricks API for JSON output , displaying on UI

Hello AllI am new to Azure Data Bricks and trying to show the Azure data bricks table data onto UI using react JS. Lets say there 2 tables Emplyee and Salary , I need to join these two tables with empid and generate JSON out put and calling API (end ...

Data Engineering

3752 Views
2 replies
0 kudos

08-08-2024 7:44:59 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

The most effective way to display joined data from Azure Databricks tables (like Employee and Salary) in a React JS UI involves exposing your Databricks data through an API and then consuming that API in your frontend. Flask can work, but there are b...

0 kudos

yesterday

1 More Replies

by rvo19941 • New Contributor II

11-04-2024 12:03:28 PM

4333 Views
2 replies
0 kudos

Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream

Dear,I am working on a real-time use case and am therefore using Auto Loader with file notification to ingest json files from a Gen2 Azure Storage Account in real-time. Full refreshes of my table work fine but I noticed Auto Loader was not picking up...

Data Engineering

ADLS

Auto Loader

Event Subscription

File Notification

Queue Storage

4333 Views
2 replies
0 kudos

11-04-2024 12:03:28 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

Auto Loader file notification in Databricks relies on Azure Event Grid’s BlobCreated event to trigger notifications for newly created files in Azure Data Lake Gen2. The issue you’re experiencing is a known limitation when files are written via certai...

0 kudos

yesterday

1 More Replies

by achntrl • New Contributor

08-01-2024 1:31:00 AM

4953 Views
1 replies
0 kudos

CI/CD - Databricks Asset Bundles - Deploy/destroy only bundles with changes after Merge Request

Hello everyone,We're in the process of migrating to Databricks and are encountering challenges implementing CI/CD using Databricks Asset Bundles. Our monorepo houses multiple independent bundles within a "dabs" directory, with only one team member wo...

Data Engineering

4953 Views
1 replies
0 kudos

08-01-2024 1:31:00 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

Your challenge—reliably determining the subset of changed Databricks Asset Bundles after a Merge Request (MR) is merged into main for focused deploy/destroy CI/CD actions—is common in complex monorepo, multi-environment setups. Let’s break down the p...

0 kudos

yesterday

by alesventus • Contributor

08-09-2024 1:40:34 AM

5439 Views
1 replies
0 kudos

Effectively refresh Power BI report based on Delta Lake

Hi, I have several Power BI reports based on Delta Lake tables that are refreshed every 4 hours. ETL process in Databricks is much cheaper that refresh of these Power BI reports. My questions are: if approach described below is correct and if there i...

Data Engineering

5439 Views
1 replies
0 kudos

08-09-2024 1:40:34 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

Current Approach Assessment Power BI Import Mode: Importing all table data results in full dataset refreshes, driving up compute and data transfer costs during each refresh. Delta Lake as Source: Databricks clusters are used for both ETL and respon...

0 kudos

yesterday

by turtleXturtle • New Contributor II

08-09-2024 7:23:02 AM

4363 Views
1 replies
2 kudos

Delta sharing speed

Hi - I am comparing the performance of delta shared tables and the speed is 10X slower than when querying locally.Scenario:I am using a 2XS serverless SQL warehouse, and have a table with 15M rows and 10 columns, using the below query:select date, co...

Data Engineering

4363 Views
1 replies
2 kudos

08-09-2024 7:23:02 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

2 kudos

Yes, the speed difference you are seeing when querying Delta Shared tables versus local Delta tables is expected due to the architectural nature of Delta Sharing and network constraints. Why Delta Sharing Is Slower When you query a standard Delta tab...

2 kudos

yesterday

by mv-rs • New Contributor

08-12-2024 7:37:46 AM

4430 Views
1 replies
0 kudos

Structured streaming not working with Serverless compute

Hi,I have a structured streaming process that is working with a normal compute but when attempting to run using Serverless, the pipeline is failing, and I'm being met with the error seen in the image below.CONTEXT: I have a Git repo with two folders,...

Data Engineering

4430 Views
1 replies
0 kudos

08-12-2024 7:37:46 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

The core answer is: Many users encounter failures in structured streaming pipelines when switching from Databricks normal (classic) compute to Serverless, especially when using read streams on Unity Catalog Delta tables with Change Data Feed (CDF) en...

0 kudos

yesterday

by Maatari • New Contributor III

08-13-2024 5:53:47 AM

3565 Views
1 replies
0 kudos

Chaining stateful Operator

I would like to do a groupby followed by a join in structured streaming. I would read from from two delta table in snapshot mode i.e. latest snapshot.My question is specifically about chaining the stateful operator. groupby is update modechaning grou...

Data Engineering

3565 Views
1 replies
0 kudos

08-13-2024 5:53:47 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

When chaining stateful operators like groupBy (aggregation) and join in Spark Structured Streaming, there are specific rules about the output mode required for the overall query and the behavior of each operator. Output Mode Requirements The groupBy...

0 kudos

yesterday

by jmeidam • New Contributor

08-16-2024 3:36:44 AM

4150 Views
2 replies
0 kudos

Displaying job-run progress when submitting jobs via databricks-sdk

When I run notebooks from within a notebook using `dbutils.notebook.run`, I see a nice progress table that updates automatically, showing the execution time, the status, links to the notebook and it is seamless.My goal now is to execute many notebook...

Data Engineering

4150 Views
2 replies
0 kudos

08-16-2024 3:36:44 AM

View Replies

Latest Reply

Coffee77
Contributor III

yesterday

0 kudos

All good in @mark_ott response. As a potential improvement, instead of using polling, I think it would be better to publish events to a Bus (i.e. Azure Event Hub) from notebooks so that consumers could launch queries when receiving, processing and fi...

0 kudos

yesterday

1 More Replies

Databricks Community

Forum Posts

SkipChangeCommit to True Scenario on Data Loss Possibility

Need to restore my scripts from the legacy version

How do i Create a workspace object with SP ownership

Can we Change the ownership of Databricks Managed Secret to SP in Azure Data Bricks?

Resolved! Issues recreating Tables with enableRowTracking and DBR16.4 and below

Structured Streaming schemaTrackingLocation does not work with starting_version

Querying Unity Managed Tables from Redshift

Azure data bricks API for JSON output , displaying on UI

Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream

CI/CD - Databricks Asset Bundles - Deploy/destroy only bundles with changes after Merge Request

Effectively refresh Power BI report based on Delta Lake

Delta sharing speed

Structured streaming not working with Serverless compute

Chaining stateful Operator

Displaying job-run progress when submitting jobs via databricks-sdk

Join Us as a Local Community Builder!

Adding maven dependency to ETL pipeline

Failed to edit ingestion pipeline PostgreSQL slot ...

Data Pipeline for Bringing Data from Oracle Fusion...

Accessing Databricks data in Salesforce via zero c...

Issues recreating Tables with enableRowTracking an...