Data Engineering

Forum Posts

Sorted by:

by seefoods • Valued Contributor

06-17-2025 8:37:23 AM

2839 Views
2 replies
1 kudos

Resolved! assets bundle

Hello Guys,I am working on assets bundle. So i want to make it generic for all team like ( analytics, data engineering), Someone could you share a best practice for this purpose ? Cordially,

Data Engineering

2839 Views
2 replies
1 kudos

06-17-2025 8:37:23 AM

View Replies

Latest Reply

Michał
New Contributor III

09-03-2025 6:55:35 AM

1 kudos

Hi seefoods, Were you able to achieve that generic asset bundle setup? I've been working on something, potentially, similar, and I'd be happy to discuss it, hoping to share experiences. While what I have works for a few teams, it is focused on declar...

1 kudos

09-03-2025 6:55:35 AM

1 More Replies

by korijn • New Contributor II

11-28-2024 7:08:27 AM

1464 Views
4 replies
0 kudos

Git integration inconsistencies between git folders and job git

It's a little confusing and limiting that the git integration support is inconsistent between the two options available.Sparse checkout is only supported when using a workspace Git folder, and checking out by commit hash is only supported when using ...

Data Engineering

1464 Views
4 replies
0 kudos

11-28-2024 7:08:27 AM

View Replies

Latest Reply

_J
Databricks Partner

09-03-2025 3:47:43 AM

0 kudos

Same here, could be a good improvement for the jobs layer guys!

0 kudos

09-03-2025 3:47:43 AM

3 More Replies

by IONA • New Contributor III

08-29-2025 2:33:36 AM

1986 Views
6 replies
7 kudos

Resolved! Getting data from the Spark query profiler

When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis? Thanks

Data Engineering

1986 Views
6 replies
7 kudos

08-29-2025 2:33:36 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-02-2025 12:51:33 AM

7 kudos

Hi @IONA ,As @Louis_Frolio correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.1. You can try to use query history system table, but it has limited number of metrics %sql SELECT * FROM system.query.history 2. You can use /a...

7 kudos

09-02-2025 12:51:33 AM

5 More Replies

by Yulei • New Contributor III

02-27-2024 4:09:35 PM

34540 Views
7 replies
1 kudos

Resolved! Could not reach driver of cluster

Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

Data Engineering

34540 Views
7 replies
1 kudos

02-27-2024 4:09:35 PM

View Replies

Latest Reply

osingh
Contributor

09-03-2025 12:40:38 AM

1 kudos

It seems like a temporary connectivity or cluster initialization glitch. So if anyone else runs into this, try re-running the job before diving into deeper troubleshooting - it might just work!Hope this helps someone save time.

1 kudos

09-03-2025 12:40:38 AM

6 More Replies

by ChristianRRL • Honored Contributor

09-02-2025 8:53:03 PM

1012 Views
1 replies
1 kudos

Resolved! Can schemaHints dynamically handle nested json structures? (Part 2)

Hi there, I'd like to follow up on a prior post:https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures/m-p/130209/highlight/true#M48731Basically I'm wondering what's the best way to set *both* d...

Data Engineering

1012 Views
1 replies
1 kudos

09-02-2025 8:53:03 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-02-2025 11:33:42 PM

1 kudos

I am not aware on schemahints supporting wildcards for now. It would be awesome to have though, I agree.So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.

1 kudos

09-02-2025 11:33:42 PM

by minhhung0507 • Valued Contributor

09-02-2025 8:43:44 PM

780 Views
1 replies
1 kudos

Could not reach driver of cluster

I am running a pipeline job in Databricks and it failed with the following message:Run failed with error message Could not reach driver of cluster 5824-145411-p65jt7uo. This message is not very descriptive, and I am not able to identify the root ca...

Data Engineering

780 Views
1 replies
1 kudos

09-02-2025 8:43:44 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-02-2025 10:58:36 PM

1 kudos

Hi @minhhung0507 ,Typically this error could appear when there's a high load on the driver node. Another reason could be related to high garbage collection on driver node as well as high memory and cpu which leads to throttling, and prevents the driv...

1 kudos

09-02-2025 10:58:36 PM

by erigaud • Honored Contributor

07-17-2023 7:32:15 AM

11866 Views
7 replies
6 kudos

Resolved! SFTP Autoloader

Hello, Don't know if it is possible, but I am wondering if it is possible to ingest files from a SFTP server using autoloader ? Or do I have to first copy the files to my dbfs and then use autoloader on that location ? Thank you !

Data Engineering

11866 Views
7 replies
6 kudos

07-17-2023 7:32:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-17-2023 8:51:35 PM

6 kudos

Hi @erigaud We haven't heard from you since the last response from, @BriceBuso and I was checking back to see if her suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to others. Al...

6 kudos

07-17-2023 8:51:35 PM

6 More Replies

by chiruinfo5262 • New Contributor II

05-26-2025 8:04:23 AM

1117 Views
4 replies
0 kudos

Trying to convert oracle sql to databricks sql but not getting the desired output

ORACLE SQL: COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN SELECTED_PERIOD_START_DATE AND SELECTED_PERIOD_END_DATE THEN 1 END ) SELECTED_PERIOD_BM,COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN COMPARISON_PERIOD_START_DATE AND COMPARISON_...

Data Engineering

1117 Views
4 replies
0 kudos

05-26-2025 8:04:23 AM

View Replies

Latest Reply

Granty
New Contributor II

09-02-2025 7:17:02 PM

0 kudos

This is a helpful comparison! I've definitely run into similar date formatting issues when migrating queries. The Oracle TRUNC function and Databricks' DATE_FORMAT/CAST combo can be tricky to reconcile. Speaking of needing a break after debugging SQL...

0 kudos

09-02-2025 7:17:02 PM

3 More Replies

by james_ • New Contributor II

08-27-2025 10:38:39 PM

882 Views
5 replies
0 kudos

Low worker utilisation in Spatial SQL

I am finding low worker node utilization when using Spatial SQL features. My cluster is DBR 17.1 with 2x workers and photon enabled.When I view the cluster metrics, they consistently show one worker around 30-50% utilized, the driver around 15-20%, a...

Data Engineering

882 Views
5 replies
0 kudos

08-27-2025 10:38:39 PM

View Replies

Latest Reply

james_
New Contributor II

09-02-2025 6:18:23 PM

0 kudos

Thank you again, @-werners- . I have a lot still to learn about partitioning and managing spatial data. Perhaps I mainly need more patience!

0 kudos

09-02-2025 6:18:23 PM

4 More Replies

by ScottH • New Contributor III

08-29-2025 10:09:49 AM

976 Views
3 replies
3 kudos

Resolved! Installing Marketplace Listing via Python SDK...

I am trying to use the Databricks Python SDK to install a Databricks Marketplace listing to Unity Catalog. I am getting stuck on how to provide a valid consumer terms version when passing the "accepted_consumer_terms" parameter to the w.consumer_inst...

Data Engineering

976 Views
3 replies
3 kudos

08-29-2025 10:09:49 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-30-2025 12:31:59 PM

3 kudos

Hi @ScottH ,It took me about 2 hours to make it right, but here it is. You need to provide valid date. And you can ask, from where that date is coming from? It's coming from consumer listing: listings = w.consumer_listings.get(id= 'e913bea3-9a37-446c...

3 kudos

08-30-2025 12:31:59 PM

2 More Replies

by der • Contributor III

08-19-2025 5:39:29 AM

1690 Views
2 replies
2 kudos

Resolved! DBR 17.1 Spatial SQL Functions and Apache Sedona

I noticed in the DBR 17.1 release notes that ST geospatial functions are now in public preview - great news for us since this means native support in Databricks.https://docs.databricks.com/aws/en/release-notes/runtime/17.1#expanded-spatial-sql-expres...

Data Engineering

1690 Views
2 replies
2 kudos

08-19-2025 5:39:29 AM

View Replies

Latest Reply

mjohns
Databricks Employee

09-02-2025 12:54:56 PM

2 kudos

Here are a few answers, feel free to hit me up on LinkedIn (michaeljohns2) if you want to discuss more particulars wrt Databricks geospatial. Looks like Sedona 1.8.0 is the release to watch for with Spark 4.0 support, see https://github.com/apache/se...

2 kudos

09-02-2025 12:54:56 PM

1 More Replies

by mikvaar • Databricks Partner

05-26-2025 1:26:13 AM

2334 Views
4 replies
1 kudos

Resolved! DLT Pipelines with DABs - Support for tags field?

Hi all,I'm working with DABs and trying to define tags for DLT pipelines in the bundle YAML config. However, adding a `tags:` block under the pipeline results in the following warning: Warning: unknown field: tags This suggests that tags might not be...

Data Engineering

2334 Views
4 replies
1 kudos

05-26-2025 1:26:13 AM

View Replies

Latest Reply

nikhilj0421
Databricks Employee

05-26-2025 6:06:52 AM

1 kudos

Hi @mikvaar, Yes, tags are not supported yet in DABs, but it is in the roadmap. The ETA for this is around first or second week of June.

1 kudos

05-26-2025 6:06:52 AM

3 More Replies

by DRock • New Contributor II

10-27-2024 5:08:18 PM

5372 Views
7 replies
0 kudos

Resolved! ODBC data source to connect to a Databricks catalog.database via MS Access Not Working

When using an ODBC data source to connect to a Databricks catalog database via Microsoft Access, the tables are not listing/appearing in the MS Access database for selection.However, when using the same ODBC data source to connect to Microsoft Excel,...

Data Engineering

5372 Views
7 replies
0 kudos

10-27-2024 5:08:18 PM

View Replies

Latest Reply

Senefelder
New Contributor II

09-02-2025 9:19:25 AM

0 kudos

Why do «Databricks employee» keep answering with the same AI generated reply, when that obviously not is the solution? Has anyone been able to come up with a solution which actually works?

0 kudos

09-02-2025 9:19:25 AM

6 More Replies

by noorbasha534 • Valued Contributor II

08-28-2025 3:35:31 PM

868 Views
2 replies
0 kudos

Databricks job calling DBT - persist job name

Hello all,Is it possible to persist Databricks job name into the Brooklyn audit tables data model when when a Databricks job calls DBT model?Currently, my colleagues persist audit information into fact & dimensional tables of the Brooklyn data model....

Data Engineering

868 Views
2 replies
0 kudos

08-28-2025 3:35:31 PM

View Replies

Latest Reply

Yogesh_Verma_
Contributor II

08-28-2025 8:35:31 PM

0 kudos

Yes, it’s possible to include the Databricks job name in your Brooklyn audit tables, but it won’t happen automatically. Right now, only the job run ID is being logged, so you’d need to extend your audit logic a bit. One common approach is to pass the...

0 kudos

08-28-2025 8:35:31 PM

1 More Replies

by auso • New Contributor

05-15-2025 7:53:48 AM

3567 Views
3 replies
2 kudos

Asset Bundles: Shared libraries and notebooks in monorepo multi-bundle setup

I am part of a small team of Data Engineers which started using Databricks Asset Bundles one year ago. Our code base consists of typical ETL-workloads written primarily in Jupyter notebooks (.ipynb), and jobs (.yaml) with our codebase spanning across...

Data Engineering

3567 Views
3 replies
2 kudos

05-15-2025 7:53:48 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-02-2025 5:53:08 AM

2 kudos

1. the easiest way to do this is to package your shared librabries into a wheel (suppose you use python). Like that you do not have to mess with the pythonpath and you can install these libs automatically to any cluster (via policies or dabs or what...

2 kudos

09-02-2025 5:53:08 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! assets bundle

Git integration inconsistencies between git folders and job git

Resolved! Getting data from the Spark query profiler

Resolved! Could not reach driver of cluster

Resolved! Can schemaHints dynamically handle nested json structures? (Part 2)

Could not reach driver of cluster

Resolved! SFTP Autoloader

Trying to convert oracle sql to databricks sql but not getting the desired output

Low worker utilisation in Spatial SQL

Resolved! Installing Marketplace Listing via Python SDK...

Resolved! DBR 17.1 Spatial SQL Functions and Apache Sedona

Resolved! DLT Pipelines with DABs - Support for tags field?

Resolved! ODBC data source to connect to a Databricks catalog.database via MS Access Not Working

Databricks job calling DBT - persist job name

Asset Bundles: Shared libraries and notebooks in monorepo multi-bundle setup

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template