cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

PabloCSD
by Valued Contributor II
  • 456 Views
  • 1 replies
  • 0 kudos

How to configure a Job-Compute for Unity Catalog Access? (Q/A)

If you need to access tables that are in a volume of Unity Catalog (UC), with the following configuration will work:targets: dev: mode: development default: true workspace: host: https://<workspace>.azuredatabricks.net/ run_as...

  • 456 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 0 kudos

Hello @PabloCSD Good day!Are you asking or like what are you expectations?Additions to this: You cannot create or register tables (managed or external) with locations pointing to volumes, as this is explicitly not supported—tables must use tabular st...

  • 0 kudos
Espenol1
by New Contributor II
  • 12570 Views
  • 5 replies
  • 2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

  • 12570 Views
  • 5 replies
  • 2 kudos
Latest Reply
vr
Valued Contributor
  • 2 kudos

As of today, you can use https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-services/service-credentials

  • 2 kudos
4 More Replies
santhiya
by New Contributor
  • 1501 Views
  • 2 replies
  • 0 kudos

CPU usage and idle time metrics from system tables

I need to get my compute metric, not from the UI...the system tables has not much informations, node_timeline has per minute record metric so it's difficult to calculate each compute CPU usage per day. Any way we can get the CPU usage,CPU idle time,M...

  • 1501 Views
  • 2 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

To calculate CPU usage, CPU idle time, and memory usage per cluster per day, you can use the system.compute.node_timeline system table. However, since the data in this table is recorded at per-minute granularity, it’s necessary to aggregate the data ...

  • 0 kudos
1 More Replies
fly_high_five
by New Contributor III
  • 1329 Views
  • 5 replies
  • 1 kudos

Resolved! Unable to retrieve all rows of delta table using SQL endpoint of Interactive Cluster

Hi,I am trying to query a table using JDBC endpoint of Interactive Cluster. I am connected to JDBC endpoint using DBeaver. When I export a small subset of data 2000-8000 rows, it works fine and export the data. However, when I try to export all rows ...

  • 1329 Views
  • 5 replies
  • 1 kudos
Latest Reply
WiliamRosa
Contributor III
  • 1 kudos

Hi @fly_high_five,I found these references about this situation, see if they help you: increase the SocketTimeout in JDBC (Databricks KB “Best practices when using JDBC with Databricks SQL” – https://kb.databricks.com/dbsql/job-timeout-when-connectin...

  • 1 kudos
4 More Replies
fly_high_five
by New Contributor III
  • 1023 Views
  • 4 replies
  • 2 kudos

Resolved! Exposing Data for Consumers in non-UC ADB

Hi,I want to expose data to consumers from our non-UC ADB. Consumers would be consuming data mainly using SQL client like DBeaver.  I tried SQL endpoint of Interactive Cluster and connected via DBeaver however when I try to fetch/export all rows of t...

  • 1023 Views
  • 4 replies
  • 2 kudos
Latest Reply
fly_high_five
New Contributor III
  • 2 kudos

Hi @szymon_dybczak I am using latest JDBC driver 2.7.3 https://www.databricks.com/spark/jdbc-drivers-archiveAnd my JDBC url comes from JDBC endpoint of Interactive Cluster.jdbc:databricks://adb-{workspace_id}.azuredatabricks.net:443/default;transport...

  • 2 kudos
3 More Replies
kmodelew
by New Contributor III
  • 1963 Views
  • 10 replies
  • 22 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 1963 Views
  • 10 replies
  • 22 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 22 kudos

@ck7007 thanks for the update. Absolutely love that you've tested the solution too! Big props . As you mention, if we keep the community accurate, it'll mean that when someone else searches for the thread, they don't end up using an incorrect solutio...

  • 22 kudos
9 More Replies
jfvizoso
by New Contributor II
  • 12686 Views
  • 5 replies
  • 0 kudos

Can I pass parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...

  • 12686 Views
  • 5 replies
  • 0 kudos
Latest Reply
DeepakAI
New Contributor II
  • 0 kudos

Team - any workaround possible? I have 100+ tables which need to be ingested incrementally. I created a single DTL notebook which i am using inside a pipeline as a task, this pipeline is triggered via job on file arrival event. I want to utilize same...

  • 0 kudos
4 More Replies
Worrachon
by New Contributor
  • 434 Views
  • 1 replies
  • 0 kudos

Data bricks Connot run pipeline

 found that when I run the pipeline, it shows the message "'Cannot run pipeline', 'PL_TRNF_CRM_SALESFORCE_TO_BLOB', "HTTPSConnectionPool(host='management.azure.com', port=443) It doesn't happen on every instance, but I encounter this case often. 

Worrachon_3-1757025996750.png
  • 434 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

What exactly does the pipeline do?  Fetch data from a source system? I also see Data Factory as a component?

  • 0 kudos
saicharandeepb
by Contributor
  • 465 Views
  • 1 replies
  • 1 kudos

Impact of Capturing Streaming Metrics to ADLS on Data Load Performance

Hi Community,I’m working on capturing Structured Streaming metrics and persisting them to Azure Data Lake Storage (ADLS) for monitoring and logging. To achieve this, I implemented a custom StreamingQueryListener that writes streaming progress data as...

image (1).png
  • 465 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @saicharandeepb ,The behaviour you're experiencing can happen with coalesce. The thing is, when you use coalesce(1), you're sacrificing parallelism and everything is performed on a single executor.There's even a warning in Apache Spark OSS regardi...

  • 1 kudos
stucas
by New Contributor II
  • 704 Views
  • 2 replies
  • 0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

  • 704 Views
  • 2 replies
  • 0 kudos
Latest Reply
stucas
New Contributor II
  • 0 kudos

Thank you for the reply - I have tried this (it was suggested in earlier solutions); but that may well be a side effect of the above function.query = f"""            SELECT pivot_key,                {select_clause}            FROM                data...

  • 0 kudos
1 More Replies
noorbasha534
by Valued Contributor II
  • 611 Views
  • 3 replies
  • 1 kudos

Cost attribution based on table history statistics

Hello all,I have a job that processes 50 tables - 25 belong to finance, 20 belong to master data, 5 belong to supply chain data domains.Now, imagine the job ran for 14 hours and did cost me 1000 euros on a day. If I like to attribute the per day cost...

  • 611 Views
  • 3 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

Root Cause / Why executionTimeMs isn’t idealexecutionTimeMs includes everything the job did:Waiting for resourcesShuffle, GC, or network latencyContention with other concurrent jobsUsing this to allocate costs can misattribute costs, especially if so...

  • 1 kudos
2 More Replies
ManojkMohan
by Honored Contributor II
  • 2912 Views
  • 15 replies
  • 17 kudos

Resolved! Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Problem i am trying to solve:Bronze is the landing zone for immutable, raw data.At this stage, i am trying to sse a columnar format (Parquet or ORC) → good compression, efficient scans. and then apply lightweight compression (e.g., Snappy) → balances...

  • 2912 Views
  • 15 replies
  • 17 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 17 kudos

@szymon_dybczak @BS_THE_ANALYST @Coffee77 @TheOC  the use case summary is as eblow The use case: A telecom operator wants to minimize unnecessary truck rolls (sending technicians to customer sites), which cost $100–$200 per visit.Data sources feeding...

  • 17 kudos
14 More Replies
dbdev
by Contributor
  • 1727 Views
  • 10 replies
  • 4 kudos

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Hi!As the title suggests, I want to install Maven libaries on my cluster with access mode 'Standard'. Our workspace is VNet injected and has Unity Catalog enabled.The coordinates have been allowlisted by the account team according to these instructio...

dbdev_1-1756137297433.png dbdev_2-1756137354610.png dbdev_3-1756137433510.png
  • 1727 Views
  • 10 replies
  • 4 kudos
Latest Reply
dbdev
Contributor
  • 4 kudos

We have resolved the Metastore issue, which also seemed to have resolved the JAR issue. I don't have a clue why this resolves it. The network people might have used service tags which also opened the workspace to the odbc connections?

  • 4 kudos
9 More Replies
seefoods
by Valued Contributor
  • 1482 Views
  • 4 replies
  • 1 kudos

Resolved! read json files on unity catalog

Hello Guys,  I have some issue when i load several json files which have a same schema on databricks. when i do2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_alice_out.json 516.13 KB2025_07_17_19_55_00_2025_07_31_21_55_00_17Q51D_bob_out.json 516.13 K...

  • 1482 Views
  • 4 replies
  • 1 kudos
Latest Reply
seefoods
Valued Contributor
  • 1 kudos

Hello @szymon_dybczak , Its Ok i have check the history of the table. I'm so confuse about the command display() output and the really output write operationThanx

  • 1 kudos
3 More Replies
victorNilsson
by New Contributor II
  • 808 Views
  • 3 replies
  • 2 kudos

Read polars from recently created csv file

More and more python packages transition to use polars instead of e.g. pandas. There is a problem with this in databricks when trying to read a csv file with it using pl.read_csv("filename.csv") when the file has been created in the same notebook cel...

victorNilsson_1-1756734657452.png
Data Engineering
csv
file system
OSError
polars
  • 808 Views
  • 3 replies
  • 2 kudos
Latest Reply
Pilsner
Contributor III
  • 2 kudos

Hello @victorNilsson Thank you for letting me know how to replicate the issue, I was able to get the same error this time. I've given the problem another go and think I have been able to fix it by specifying the output path as "/tmp/test.csv". By wri...

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels