cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

fly_high_five
by Contributor
  • 1357 Views
  • 4 replies
  • 2 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver.  I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

  • 1357 Views
  • 4 replies
  • 2 kudos
Latest Reply
fly_high_five
Contributor
  • 2 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

  • 2 kudos
3 More Replies
kmodelew
by New Contributor III
  • 3586 Views
  • 10 replies
  • 22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 3586 Views
  • 10 replies
  • 22 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

  • 22 kudos
9 More Replies
Worrachon
by New Contributor
  • 496 Views
  • 1 replies
  • 0 kudos

Data bricks Connot run pipeline

 found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often. 

Worrachon_3-1757025996750.png
  • 496 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

What exactly does the pipeline do?  Fetch data from a source system? I also see Data Factory as a component?

  • 0 kudos
saicharandeepb
by Contributor
  • 558 Views
  • 1 replies
  • 1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

image (1).png
  • 558 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

  • 1 kudos
stucas
by New Contributor II
  • 1024 Views
  • 2 replies
  • 0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

  • 1024 Views
  • 2 replies
  • 0 kudos
Latest Reply
stucas
New Contributor II
  • 0 kudos

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.query = f"""            SELECT pivot_key,                {select_clause}            FROM                data...

  • 0 kudos
1 More Replies
noorbasha534
by Valued Contributor II
  • 878 Views
  • 3 replies
  • 1 kudos

Cost attribution based on table history statistics

Hello all,I have a job that processes 50 tables - 25 belong to finance, 20 belong to master data, 5 belong to supply chain data domains.Now, imagine the job ran for 14 hours and did cost me 1000 euros on a day. If I like to attribute the per day cost...

  • 878 Views
  • 3 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

Root Cause / Why executionTimeMs isn’t idealexecutionTimeMs includes everything the job did:Waiting for resourcesShuffle, GC, or network latencyContention with other concurrent jobsUsing this to allocate costs can misattribute costs, especially if so...

  • 1 kudos
2 More Replies
ManojkMohan
by Honored Contributor II
  • 4885 Views
  • 15 replies
  • 17 kudos

Resolved! Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Problem i am trying to solve:Bronze is the landing zone for immutable, raw data.At this stage, i am trying to sse a columnar format (Parquet or ORC) → good compression, efficient scans. and then apply lightweight compression (e.g., Snappy) → balances...

  • 4885 Views
  • 15 replies
  • 17 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 17 kudos

@szymon_dybczak @BS_THE_ANALYST @Coffee77 @TheOC  the use case summary is as eblow The use case: A telecom operator wants to minimize unnecessary truck rolls (sending technicians to customer sites), which cost $100–$200 per visit.Data sources feeding...

  • 17 kudos
14 More Replies
dbdev
by Databricks Partner
  • 2062 Views
  • 10 replies
  • 4 kudos

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Hi!As the title suggests, I want to install Maven libaries on my cluster with access mode 'Standard'. Our workspace is VNet injected and has Unity Catalog enabled.The coordinates have been allowlisted by the account team according to these instructio...

dbdev_1-1756137297433.png dbdev_2-1756137354610.png dbdev_3-1756137433510.png
  • 2062 Views
  • 10 replies
  • 4 kudos
Latest Reply
dbdev
Databricks Partner
  • 4 kudos

We have resolved the Metastore issue, which also seemed to have resolved the JAR issue. I don't have a clue why this resolves it. The network people might have used service tags which also opened the workspace to the odbc connections?

  • 4 kudos
9 More Replies
seefoods
by Valued Contributor
  • 1769 Views
  • 4 replies
  • 1 kudos

Resolved! read json files on unity catalog

Hello Guys,  I have some issue when i load several json files which have a same schema on databricks. when i do2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 K...

  • 1769 Views
  • 4 replies
  • 1 kudos
Latest Reply
seefoods
Valued Contributor
  • 1 kudos

Hello @szymon_dybczak , Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operationThanx

  • 1 kudos
3 More Replies
victorNilsson
by New Contributor II
  • 1446 Views
  • 3 replies
  • 2 kudos

Read polars from recently created csv file

More and more python packages transition to use polars instead of e.g. pandas. There is a problem with this in databricks when trying to read a csv file with it using pl.read_csv("filename.csv") when the file has been created in the same notebook cel...

victorNilsson_1-1756734657452.png
Data Engineering
csv
file system
OSError
polars
  • 1446 Views
  • 3 replies
  • 2 kudos
Latest Reply
Pilsner
Databricks Partner
  • 2 kudos

Hello @victorNilsson Thank you for letting me know how to replicate the issue, I was able to get the same error this time. I've given the problem another go and think I have been able to fix it by specifying the output path as "/tmp/test.csv". By wri...

  • 2 kudos
2 More Replies
akdav
by Contributor
  • 2803 Views
  • 13 replies
  • 6 kudos

Resolved! Job File Event Trigger not firing for SftpCommit and SftpCreate

Hi there, We are using Azure Storage Account and their SFTP feature. We have 3rd parties we work with that submit reports to us via SFTP into Azure Blob Storage. We have setup a File Trigger for that external location. Everything works fine if you up...

  • 2803 Views
  • 13 replies
  • 6 kudos
Latest Reply
akdav
Contributor
  • 6 kudos

Hi Dimitry, You need to go to the external_location. Then turn off file events for that external_location.Then you still select File Trigger. It will then evaluate the external_location. It will give you a message that you can only track up to 10k Fi...

  • 6 kudos
12 More Replies
shubham007
by Databricks Partner
  • 1677 Views
  • 9 replies
  • 2 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error while import)

Hi community experts,I am getting error "cannot import name 'recon' from 'databricks.labs.lakebridge.reconcile.execute'" importing modules as shown in attached screenshot. I am follwing steps as mentioned in your partner training module "Lakebridge f...

error_recon.png
  • 1677 Views
  • 9 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @shubham007 ,They made refactoring to that module in last month so that's why it stopped working. Probably Lakebridge for SQL Source System Migration module was recorded before that change.And why they made change? It is explained here:Split recon...

  • 2 kudos
8 More Replies
Dimitry
by Valued Contributor
  • 2860 Views
  • 11 replies
  • 3 kudos

Resolved! Unreliable file events on Azure Storage (SFTP) for job trigger

Hi allI got a job trigger by a file event on the external location.The location and jobs triggers are working fine when uploading file via Azure Portal.I need SFTP trigger, so I went into the event grid, found subscription for the storage account on ...

Dimitry_2-1756857231122.png Dimitry_1-1756857151591.png
  • 2860 Views
  • 11 replies
  • 3 kudos
Latest Reply
Dimitry
Valued Contributor
  • 3 kudos

UpdateAppears that even uploading via UI does not trigger it any more. It did trigger weeks ago.I have just uploaded a file in UI and saw this message in the storage queue:{"topic":"/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Storage/st...

  • 3 kudos
10 More Replies
shubham007
by Databricks Partner
  • 643 Views
  • 1 replies
  • 0 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error in Data and Schema Validation)

Hi community experts,I am getting error while Data and Schema Validation with the Reconciler. As attached here screenshots. Please help resolve this issue.Output:    

shubham007_0-1756969442961.png shubham007_1-1756969493574.png shubham007_0-1756969649236.png shubham007_1-1756969678992.png
  • 643 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @shubham007 ,As stated in another thread. I think this error could be related to misconfiguration on your side. Lakebridge is trying to find following table in your SQL Server instance -> None.SalesLT.customerBut look at which database reconciliat...

  • 0 kudos
Nabbott
by New Contributor
  • 1863 Views
  • 1 replies
  • 2 kudos

Databrick Genie

I have curated silver and gold tables in Advana that feed downstream applications. Other organizations also create tables for their own use. Can Databricks Genie query across tables from different pipelines within the same organization and across mul...

  • 1863 Views
  • 1 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Can you explain the landscape a bit more? The term "pipelines" means something specific in Databricks. You mention "across multiple organizations." What does that mean?  Are you guys using Unity Catalog, are all the tables/data in Unity?  Please elab...

  • 2 kudos
Labels