Data Engineering

Forum Posts

Sorted by:

by fly_high_five • Contributor

09-05-2025 2:56:26 AM

1357 Views
4 replies
2 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver. I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

Data Engineering

1357 Views
4 replies
2 kudos

09-05-2025 2:56:26 AM

View Replies

Latest Reply

fly_high_five
Contributor

09-05-2025 3:38:24 AM

2 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

2 kudos

09-05-2025 3:38:24 AM

3 More Replies

by kmodelew • New Contributor III

09-03-2025 8:04:34 AM

3586 Views
10 replies
22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

Data Engineering

3586 Views
10 replies
22 kudos

09-03-2025 8:04:34 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

09-05-2025 1:36:07 AM

22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

22 kudos

09-05-2025 1:36:07 AM

9 More Replies

by Worrachon • New Contributor

09-04-2025 3:47:17 PM

496 Views
1 replies
0 kudos

Data bricks Connot run pipeline

found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often.

Data Engineering

496 Views
1 replies
0 kudos

09-04-2025 3:47:17 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-05-2025 12:09:10 AM

0 kudos

What exactly does the pipeline do? Fetch data from a source system? I also see Data Factory as a component?

0 kudos

09-05-2025 12:09:10 AM

by saicharandeepb • Contributor

09-04-2025 10:25:24 PM

558 Views
1 replies
1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

Data Engineering

558 Views
1 replies
1 kudos

09-04-2025 10:25:24 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-04-2025 11:25:34 PM

1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

1 kudos

09-04-2025 11:25:34 PM

by stucas • New Contributor II

09-03-2025 5:10:49 AM

1024 Views
2 replies
0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

Data Engineering

1024 Views
2 replies
0 kudos

09-03-2025 5:10:49 AM

View Replies

Latest Reply

stucas
New Contributor II

09-04-2025 10:27:43 PM

0 kudos

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.query = f""" SELECT pivot_key, {select_clause} FROM data...

0 kudos

09-04-2025 10:27:43 PM

1 More Replies

by noorbasha534 • Valued Contributor II

09-04-2025 10:09:26 AM

878 Views
3 replies
1 kudos

Cost attribution based on table history statistics

Hello all,I have a job that processes 50 tables - 25 belong to finance, 20 belong to master data, 5 belong to supply chain data domains.Now, imagine the job ran for 14 hours and did cost me 1000 euros on a day. If I like to attribute the per day cost...

Data Engineering

878 Views
3 replies
1 kudos

09-04-2025 10:09:26 AM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

09-04-2025 1:26:54 PM

1 kudos

Root Cause / Why executionTimeMs isn’t idealexecutionTimeMs includes everything the job did:Waiting for resourcesShuffle, GC, or network latencyContention with other concurrent jobsUsing this to allocate costs can misattribute costs, especially if so...

1 kudos

09-04-2025 1:26:54 PM

2 More Replies

by ManojkMohan • Honored Contributor II

08-31-2025 1:15:44 PM

4885 Views
15 replies
17 kudos

Resolved! Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Problem i am trying to solve:Bronze is the landing zone for immutable, raw data.At this stage, i am trying to sse a columnar format (Parquet or ORC) → good compression, efficient scans. and then apply lightweight compression (e.g., Snappy) → balances...

Data Engineering

4885 Views
15 replies
17 kudos

08-31-2025 1:15:44 PM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

09-04-2025 1:16:47 PM

17 kudos

@szymon_dybczak @BS_THE_ANALYST @Coffee77 @TheOC the use case summary is as eblow The use case: A telecom operator wants to minimize unnecessary truck rolls (sending technicians to customer sites), which cost $100–$200 per visit.Data sources feeding...

17 kudos

09-04-2025 1:16:47 PM

14 More Replies

by dbdev • Databricks Partner

08-25-2025 8:58:40 AM

2062 Views
10 replies
4 kudos

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Hi!As the title suggests, I want to install Maven libaries on my cluster with access mode 'Standard'. Our workspace is VNet injected and has Unity Catalog enabled.The coordinates have been allowlisted by the account team according to these instructio...

Data Engineering

2062 Views
10 replies
4 kudos

08-25-2025 8:58:40 AM

View Replies

Latest Reply

dbdev
Databricks Partner

09-04-2025 9:47:32 AM

4 kudos

We have resolved the Metastore issue, which also seemed to have resolved the JAR issue. I don't have a clue why this resolves it. The network people might have used service tags which also opened the workspace to the odbc connections?

4 kudos

09-04-2025 9:47:32 AM

9 More Replies

by seefoods • Valued Contributor

09-04-2025 2:09:30 AM

1769 Views
4 replies
1 kudos

Resolved! read json files on unity catalog

Hello Guys, I have some issue when i load several json files which have a same schema on databricks. when i do2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 K...

Data Engineering

1769 Views
4 replies
1 kudos

09-04-2025 2:09:30 AM

View Replies

Latest Reply

seefoods
Valued Contributor

09-04-2025 8:32:26 AM

1 kudos

Hello @szymon_dybczak , Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operationThanx

1 kudos

09-04-2025 8:32:26 AM

3 More Replies

by victorNilsson • New Contributor II

09-01-2025 6:56:06 AM

1446 Views
3 replies
2 kudos

Read polars from recently created csv file

More and more python packages transition to use polars instead of e.g. pandas. There is a problem with this in databricks when trying to read a csv file with it using pl.read_csv("filename.csv") when the file has been created in the same notebook cel...

Data Engineering

csv

file system

OSError

polars

1446 Views
3 replies
2 kudos

09-01-2025 6:56:06 AM

View Replies

Latest Reply

Pilsner
Databricks Partner

09-04-2025 6:12:22 AM

2 kudos

Hello @victorNilsson Thank you for letting me know how to replicate the issue, I was able to get the same error this time. I've given the problem another go and think I have been able to fix it by specifying the output path as "/tmp/test.csv". By wri...

2 kudos

09-04-2025 6:12:22 AM

2 More Replies

by akdav • Contributor

08-13-2025 9:23:02 AM

2803 Views
13 replies
6 kudos

Resolved! Job File Event Trigger not firing for SftpCommit and SftpCreate

Hi there, We are using Azure Storage Account and their SFTP feature. We have 3rd parties we work with that submit reports to us via SFTP into Azure Blob Storage. We have setup a File Trigger for that external location. Everything works fine if you up...

Data Engineering

2803 Views
13 replies
6 kudos

08-13-2025 9:23:02 AM

View Replies

Latest Reply

akdav
Contributor

09-04-2025 2:44:10 AM

6 kudos

Hi Dimitry, You need to go to the external_location. Then turn off file events for that external_location.Then you still select File Trigger. It will then evaluate the external_location. It will give you a message that you can only track up to 10k Fi...

6 kudos

09-04-2025 2:44:10 AM

12 More Replies

by shubham007 • Databricks Partner

09-02-2025 5:01:46 AM

1677 Views
9 replies
2 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error while import)

Hi community experts,I am getting error "cannot import name 'recon' from 'databricks.labs.lakebridge.reconcile.execute'" importing modules as shown in attached screenshot. I am follwing steps as mentioned in your partner training module "Lakebridge f...

Data Engineering

1677 Views
9 replies
2 kudos

09-02-2025 5:01:46 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-02-2025 5:13:03 AM

2 kudos

Hi @shubham007 ,They made refactoring to that module in last month so that's why it stopped working. Probably Lakebridge for SQL Source System Migration module was recorded before that change.And why they made change? It is explained here:Split recon...

2 kudos

09-02-2025 5:13:03 AM

8 More Replies

by Dimitry • Valued Contributor

09-02-2025 4:57:05 PM

2860 Views
11 replies
3 kudos

Resolved! Unreliable file events on Azure Storage (SFTP) for job trigger

Hi allI got a job trigger by a file event on the external location.The location and jobs triggers are working fine when uploading file via Azure Portal.I need SFTP trigger, so I went into the event grid, found subscription for the storage account on ...

Data Engineering

2860 Views
11 replies
3 kudos

09-02-2025 4:57:05 PM

View Replies

Latest Reply

Dimitry
Valued Contributor

09-02-2025 5:09:55 PM

3 kudos

UpdateAppears that even uploading via UI does not trigger it any more. It did trigger weeks ago.I have just uploaded a file in UI and saw this message in the storage queue:{"topic":"/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Storage/st...

3 kudos

09-02-2025 5:09:55 PM

10 More Replies

by shubham007 • Databricks Partner

09-04-2025 12:08:22 AM

643 Views
1 replies
0 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error in Data and Schema Validation)

Hi community experts,I am getting error while Data and Schema Validation with the Reconciler. As attached here screenshots. Please help resolve this issue.Output:

Data Engineering

643 Views
1 replies
0 kudos

09-04-2025 12:08:22 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-04-2025 2:56:27 AM

0 kudos

Hi @shubham007 ,As stated in another thread. I think this error could be related to misconfiguration on your side. Lakebridge is trying to find following table in your SQL Server instance -> None.SalesLT.customerBut look at which database reconciliat...

0 kudos

09-04-2025 2:56:27 AM

by Nabbott • New Contributor

09-03-2025 7:27:54 AM

1863 Views
1 replies
2 kudos

Databrick Genie

I have curated silver and gold tables in Advana that feed downstream applications. Other organizations also create tables for their own use. Can Databricks Genie query across tables from different pipelines within the same organization and across mul...

Data Engineering

1863 Views
1 replies
2 kudos

09-03-2025 7:27:54 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

09-04-2025 12:19:22 AM

2 kudos

Can you explain the landscape a bit more? The term "pipelines" means something specific in Databricks. You mention "across multiple organizations." What does that mean? Are you guys using Unity Catalog, are all the tables/data in Unity? Please elab...

2 kudos

09-04-2025 12:19:22 AM

Databricks Community

Forum Posts

Resolved! Exposing Data for Consumers in non-UC ADB

Unable to read excel file from Volume

Data bricks Connot run pipeline

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

DLT Pipeline and Pivot tables

Cost attribution based on table history statistics

Resolved! Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Resolved! read json files on unity catalog

Read polars from recently created csv file

Resolved! Job File Event Trigger not firing for SftpCommit and SftpCreate

Databricks Lakebridge: Azure SQL DB to Databricks (Error while import)

Resolved! Unreliable file events on Azure Storage (SFTP) for job trigger

Databricks Lakebridge: Azure SQL DB to Databricks (Error in Data and Schema Validation)

Databrick Genie

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...