cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

santhiya
by New Contributor
  • 1432 Views
  • 2 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 1432 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
1 More Replies
fly_high_five
by New Contributor III
  • 1220 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

  • 1220 Views
  • 5 replies
  • 1 kudos
Latest Reply
WiliamRosa
Contributor III
  • 1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

  • 1 kudos
4 More Replies
fly_high_five
by New Contributor III
  • 967 Views
  • 4 replies
  • 1 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver.  I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

  • 967 Views
  • 4 replies
  • 1 kudos
Latest Reply
fly_high_five
New Contributor III
  • 1 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

  • 1 kudos
3 More Replies
kmodelew
by New Contributor III
  • 1777 Views
  • 10 replies
  • 22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 1777 Views
  • 10 replies
  • 22 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

  • 22 kudos
9 More Replies
jfvizoso
by New Contributor II
  • 12580 Views
  • 5 replies
  • 0 kudos

Can I pass parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...

  • 12580 Views
  • 5 replies
  • 0 kudos
Latest Reply
DeepakAI
New Contributor II
  • 0 kudos

Team - any workaround possible? I have 100+ tables which need to be ingested incrementally. I created a single DTL notebook which i am using inside a pipeline as a task, this pipeline is triggered via job on file arrival event. I want to utilize same...

  • 0 kudos
4 More Replies
Worrachon
by New Contributor
  • 417 Views
  • 1 replies
  • 0 kudos

Data bricks Connot run pipeline

 found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often. 

Worrachon_3-1757025996750.png
  • 417 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

What exactly does the pipeline do?  Fetch data from a source system? I also see Data Factory as a component?

  • 0 kudos
saicharandeepb
by New Contributor III
  • 445 Views
  • 1 replies
  • 1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

image (1).png
  • 445 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

  • 1 kudos
stucas
by New Contributor II
  • 633 Views
  • 2 replies
  • 0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

  • 633 Views
  • 2 replies
  • 0 kudos
Latest Reply
stucas
New Contributor II
  • 0 kudos

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.query = f"""            SELECT pivot_key,                {select_clause}            FROM                data...

  • 0 kudos
1 More Replies
noorbasha534
by Valued Contributor II
  • 569 Views
  • 3 replies
  • 1 kudos

Cost attribution based on table history statistics

Hello all,I have a job that processes 50 tables - 25 belong to finance, 20 belong to master data, 5 belong to supply chain data domains.Now, imagine the job ran for 14 hours and did cost me 1000 euros on a day. If I like to attribute the per day cost...

  • 569 Views
  • 3 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

Root Cause / Why executionTimeMs isn’t idealexecutionTimeMs includes everything the job did:Waiting for resourcesShuffle, GC, or network latencyContention with other concurrent jobsUsing this to allocate costs can misattribute costs, especially if so...

  • 1 kudos
2 More Replies
ManojkMohan
by Honored Contributor II
  • 2696 Views
  • 15 replies
  • 17 kudos

Resolved! Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Problem i am trying to solve:Bronze is the landing zone for immutable, raw data.At this stage, i am trying to sse a columnar format (Parquet or ORC) → good compression, efficient scans. and then apply lightweight compression (e.g., Snappy) → balances...

  • 2696 Views
  • 15 replies
  • 17 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 17 kudos

@szymon_dybczak @BS_THE_ANALYST @Coffee77 @TheOC  the use case summary is as eblow The use case: A telecom operator wants to minimize unnecessary truck rolls (sending technicians to customer sites), which cost $100–$200 per visit.Data sources feeding...

  • 17 kudos
14 More Replies
dbdev
by Contributor
  • 1636 Views
  • 10 replies
  • 4 kudos

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Hi!As the title suggests, I want to install Maven libaries on my cluster with access mode 'Standard'. Our workspace is VNet injected and has Unity Catalog enabled.The coordinates have been allowlisted by the account team according to these instructio...

dbdev_1-1756137297433.png dbdev_2-1756137354610.png dbdev_3-1756137433510.png
  • 1636 Views
  • 10 replies
  • 4 kudos
Latest Reply
dbdev
Contributor
  • 4 kudos

We have resolved the Metastore issue, which also seemed to have resolved the JAR issue. I don't have a clue why this resolves it. The network people might have used service tags which also opened the workspace to the odbc connections?

  • 4 kudos
9 More Replies
seefoods
by Valued Contributor
  • 1423 Views
  • 4 replies
  • 1 kudos

Resolved! read json files on unity catalog

Hello Guys,  I have some issue when i load several json files which have a same schema on databricks. when i do2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 K...

  • 1423 Views
  • 4 replies
  • 1 kudos
Latest Reply
seefoods
Valued Contributor
  • 1 kudos

Hello @szymon_dybczak , Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operationThanx

  • 1 kudos
3 More Replies
victorNilsson
by New Contributor II
  • 749 Views
  • 3 replies
  • 2 kudos

Read polars from recently created csv file

More and more python packages transition to use polars instead of e.g. pandas. There is a problem with this in databricks when trying to read a csv file with it using pl.read_csv("filename.csv") when the file has been created in the same notebook cel...

victorNilsson_1-1756734657452.png
Data Engineering
csv
file system
OSError
polars
  • 749 Views
  • 3 replies
  • 2 kudos
Latest Reply
Pilsner
Contributor III
  • 2 kudos

Hello @victorNilsson Thank you for letting me know how to replicate the issue, I was able to get the same error this time. I've given the problem another go and think I have been able to fix it by specifying the output path as "/tmp/test.csv". By wri...

  • 2 kudos
2 More Replies
akdav
by Contributor
  • 1831 Views
  • 13 replies
  • 6 kudos

Resolved! Job File Event Trigger not firing for SftpCommit and SftpCreate

Hi there, We are using Azure Storage Account and their SFTP feature. We have 3rd parties we work with that submit reports to us via SFTP into Azure Blob Storage. We have setup a File Trigger for that external location. Everything works fine if you up...

  • 1831 Views
  • 13 replies
  • 6 kudos
Latest Reply
akdav
Contributor
  • 6 kudos

Hi Dimitry, You need to go to the external_location. Then turn off file events for that external_location.Then you still select File Trigger. It will then evaluate the external_location. It will give you a message that you can only track up to 10k Fi...

  • 6 kudos
12 More Replies
shubham007
by New Contributor III
  • 1088 Views
  • 9 replies
  • 2 kudos

Databricks Lakebridge: Azure SQL DB to Databricks (Error while import)

Hi community experts,I am getting error "cannot import name 'recon' from 'databricks.labs.lakebridge.reconcile.execute'" importing modules as shown in attached screenshot. I am follwing steps as mentioned in your partner training module "Lakebridge f...

error_recon.png
  • 1088 Views
  • 9 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @shubham007 ,They made refactoring to that module in last month so that's why it stopped working. Probably Lakebridge for SQL Source System Migration module was recorded before that change.And why they made change? It is explained here:Split recon...

  • 2 kudos
8 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels