Data Engineering

Forum Posts

Sorted by:

by Chris_N • Visitor

9 hours ago

19 Views
1 replies
0 kudos

Unable to configure clustering on DLT tables

Hi TeamI have a DLT pipeline with `cluster_by` property configured for all my tables. The code looks something like below:@Dlt.table( name="flows", cluster_by=["from"] ) def flows(): <LOGIC>It was all working fine and in couple of days, the queries w...

Data Engineering

19 Views
1 replies
0 kudos

9 hours ago

View Replies

Latest Reply

NandiniN
Databricks Employee

6 hours ago

0 kudos

Hi @Chris_N , You have mentioned - "I couldn't find any cluster properties configured." If they existed and were changed, you can use the delta history command to check if someone changed on the clustering information. It is possible there were ch...

0 kudos

6 hours ago

by adrianhernandez • New Contributor III

7 hours ago

14 Views
1 replies
0 kudos

Wheel permissions issue

I get a : org.apache.spark.SparkSecurityException: [INSUFFICIENT_PERMISSIONS] Insufficient privileges: User does not have permission MODIFY,SELECT on any file. SQLSTATE: 42501 at com.databricks.sql.acl.Unauthorized.throwInsufficientPermissionsError(P...

Data Engineering

14 Views
1 replies
0 kudos

7 hours ago

View Replies

Latest Reply

NandiniN
Databricks Employee

7 hours ago

0 kudos

Hi @adrianhernandez , The permissions error indicates you need to have the privileges for "any file". To resolve this, Can you try by adding the corresponding permissions and see if it works: %sql GRANT SELECT ON ANY FILE TO `username` %sql GRANT MO...

0 kudos

7 hours ago

by Vsleg • Contributor

03-13-2024 7:14:10 AM

3016 Views
4 replies
0 kudos

Enabling enableChangeDataFeed on Streaming Table created in DLT

Hello, Can I enable Change Data Feed on Streaming Tables? How should I do this? I couldn't find this in the existing documentation https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed .

Data Engineering

3016 Views
4 replies
0 kudos

03-13-2024 7:14:10 AM

View Replies

Latest Reply

john77
New Contributor II

9 hours ago

0 kudos

I have noticed the same issue.

0 kudos

9 hours ago

3 More Replies

by Michał • New Contributor III

09-03-2025 6:41:10 AM

552 Views
5 replies
2 kudos

how to process a streaming lakeflow declarative pipeline in batches

Hi, I've got a problem and I have run out of ideas as to what else I can try. Maybe you can help? I've got a delta table with hundreds millions of records on which I have to perform relatively expensive operations. I'd like to be able to process some...

Data Engineering

552 Views
5 replies
2 kudos

09-03-2025 6:41:10 AM

View Replies

Latest Reply

mmayorga
Databricks Employee

3 weeks ago

2 kudos

Hi @Michał , One detail/feature to consider when working with Declarative Pipelines is that they manage and auto-tune configuration aspects, including rate limiting (maxBytesPerTrigger or maxFilesPerTrigger). Perhaps that's why you could not see this...

2 kudos

3 weeks ago

4 More Replies

by Data_NXT • New Contributor II

3 weeks ago

392 Views
3 replies
3 kudos

Resolved! To change ownership of a materialized view

working in a Unity Catalog-enabled Databricks workspace, and we have several materialized views (MVs) that were created through a Delta Live Tables (DLT) / Lakeflow pipeline.Currently, the original owner of the pipeline has moved out of the project,...

Data Engineering

392 Views
3 replies
3 kudos

3 weeks ago

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

3 weeks ago

3 kudos

Hi @Data_NXT ,You can change the owner of a materialized view if you are a both a metastore admin and a workspace admin.Use the following steps to change a materialized views owner:Open the materialized view in Catalog Explorer, then on the Overview ...

3 kudos

3 weeks ago

2 More Replies

by Hritik_Moon • New Contributor

yesterday

58 Views
2 replies
1 kudos

Save as Delta file in catalog

Hello, I have created data frame on csv file when I try to write it as:df_op_clean.write.format("delta").save("/Volumes/optimisation/trial")I get this error :Cannot access the UC Volume path from this location. Path was /Volumes/optimisation/trial/_d...

Data Engineering

58 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

-werners-
Esteemed Contributor III

18 hours ago

1 kudos

Also to add on this:avoid overlap between tables and Volumes.Create a separate folder for tables and files.Unity catalog does this too if you use managed tables/volumes.

1 kudos

18 hours ago

1 More Replies

by mbanxp • New Contributor III

Tuesday

97 Views
2 replies
1 kudos

Most suitable Data Promotion orchestration for multi-tenant data lake in Databricks

Hi there !!! I would like to find the most suitable orchestration process to promote data between medallion layers I need to solve the following key architectural decision for scaling my multi-tenant data lake in Databricks.My setup:Independent medal...

Data Engineering

97 Views
2 replies
1 kudos

Tuesday

View Replies

Latest Reply

sarahbhord
Databricks Employee

Wednesday

1 kudos

Hey mbanxp! The most scalable and maintainable orchestration pattern for multi-tenant medallion architectures in Databricks is to build independent pipelines per table for all clients, with each pipeline parameterized by client/tenant. Why this appro...

1 kudos

Wednesday

1 More Replies

by jeremy98 • Honored Contributor

06-26-2025 10:17:52 AM

697 Views
6 replies
1 kudos

How to reference a workflow to use multiple GIT sources?

Hi community,Is it possible for a workflow to reference multiple Git sources? Specifically, can different tasks within the same workflow point to different Git repositories or types of Git sources?Ty

Data Engineering

697 Views
6 replies
1 kudos

06-26-2025 10:17:52 AM

View Replies

Latest Reply

mai_luca
New Contributor III

06-27-2025 5:37:05 AM

1 kudos

A workflow can reference multiple Git sources. You can specify the git information for each task. However, I am not sure you can have multiple GitProvider for the same workspace....

1 kudos

06-27-2025 5:37:05 AM

5 More Replies

by saicharandeepb • New Contributor III

21 hours ago

28 Views
0 replies
0 kudos

Capturing Streaming Metrics in Near Real-Time Using Cluster Logs

Over the past few weeks, I’ve been exploring ways to capture streaming metrics from our data load jobs. The goal is to monitor job performance and behavior in real time, without disrupting our existing data load pipelines.Initial Exploration: Streami...

Data Engineering

28 Views
0 replies
0 kudos

21 hours ago

by EricCournarie • New Contributor III

a week ago

281 Views
8 replies
10 kudos

ResultSet metadata does not return correct type for TIMESTAMP_NTZ

Hello, using the JDBC driver, when I retrieve the metadata of a ResultSet, the type for a TIMESTAMP_NTZ is not correct (it's a TIMESTAMP one).My SQL is a simple SELECT * on a table where you have a TIMESTAMP_NTZ columnThis works when retrieving metad...

Data Engineering

281 Views
8 replies
10 kudos

a week ago

View Replies

Latest Reply

Advika
Databricks Employee

yesterday

10 kudos

Hello @EricCournarie! Just to confirm, were you initially using the JDBC driver v2.7.3? According to the release notes, this version adds support for the TIMESTAMP_NTZ data type.

10 kudos

yesterday

7 More Replies

by karuppusamy • New Contributor

yesterday

137 Views
4 replies
5 kudos

Resolved! Getting an warning message in Declarative Pipelines.

Hi Team,While creating a Declarative ETL pipeline in Databricks, I tried to configure a notebook using the "Add existing assets" option by providing the notebook path. However, I received a warning message:"Legacy configuration detected. Use files in...

Data Engineering

137 Views
4 replies
5 kudos

yesterday

View Replies

Latest Reply

karuppusamy
New Contributor

yesterday

5 kudos

Thank you @szymon_dybczak, Now I have a good clarification from my end.

5 kudos

yesterday

3 More Replies

by Raj_DB • Contributor

Wednesday

189 Views
8 replies
5 kudos

Resolved! Streamlining Custom Job Notifications with a Centralized Email List

Hi Everyone,I am working on setting up success/failure notifications for a large number of jobs in our Databricks environment. The manual process of configuring email notification using UI for each job individually is not scalable and is becoming ver...

Data Engineering

189 Views
8 replies
5 kudos

Wednesday

View Replies

Latest Reply

nayan_wylde
Honored Contributor III

yesterday

5 kudos

@Raj_DB Databricks sends notifications via its internal email service, which often requires the address to be a valid individual mailbox or a distribution list that accepts external mail. If your group email is a Microsoft 365, Please check if “Allow...

5 kudos

yesterday

7 More Replies

by EricCournarie • New Contributor III

yesterday

57 Views
2 replies
0 kudos

Filling a STRUCT field with a PreparedStatement in JDBC

Hello, I'm trying to fill a STRUCT field with a PreparedStatement in Java by giving a JSON string in the PreparedStatement.But it complains Cannot resolve "infos" due to data type mismatch: cannot cast "STRING" to "STRUCT<AGE: BIGINT, NAME: STRING>"....

Data Engineering

57 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

0 kudos

Could you provide a sample of JSON string along with a code you're using? Otherwise it will be hard for us to help you.

0 kudos

yesterday

1 More Replies

by yit • Contributor

yesterday

94 Views
2 replies
3 kudos

Resolved! Difference between libraries dlt and dp

In all Databricks documentation, the examples use import dlt to create streaming tables and views. But, when generating sample Python code in ETL pipeline, the import in the sample is:import pyspark import pipelines as dpWhich one is the correct libr...

Data Engineering

94 Views
2 replies
3 kudos

yesterday

View Replies

Latest Reply

nayan_wylde
Honored Contributor III

yesterday

3 kudos

@yit Functionally, they are equivalent concepts (declarative definitions for streaming tables, materialized views, expectations, CDC, etc.). The differences you’ll notice are mostly naming/ergonomics:Module name:Databricks docs & most existing notebo...

3 kudos

yesterday

1 More Replies

by SuMiT1 • New Contributor III

Wednesday

172 Views
8 replies
3 kudos

Flattening the json in databricks

I have chatbot data I read adls json file in databricks and i stored the output in dataframeIn that table two columns contains json data but the data type is string1.content2.metadata Now i have to flatten the.data but i am not getting how to do tha...

Data Engineering

172 Views
8 replies
3 kudos

Wednesday

View Replies

Latest Reply

SuMiT1
New Contributor III

yesterday

3 kudos

Hi @szymon_dybczak I gave the wrong content json valueHere is the updated one could you please tell me the code for this it would be helpful for me you gave the code already but i am getting confused so please tell me for this { "activities": [ { "va...

3 kudos

yesterday

7 More Replies

Databricks Community

Forum Posts

Unable to configure clustering on DLT tables

Wheel permissions issue

Enabling enableChangeDataFeed on Streaming Table created in DLT

how to process a streaming lakeflow declarative pipeline in batches

Resolved! To change ownership of a materialized view

Save as Delta file in catalog

Most suitable Data Promotion orchestration for multi-tenant data lake in Databricks

How to reference a workflow to use multiple GIT sources?

Capturing Streaming Metrics in Near Real-Time Using Cluster Logs

ResultSet metadata does not return correct type for TIMESTAMP_NTZ

Resolved! Getting an warning message in Declarative Pipelines.

Resolved! Streamlining Custom Job Notifications with a Centralized Email List

Filling a STRUCT field with a PreparedStatement in JDBC

Resolved! Difference between libraries dlt and dp

Flattening the json in databricks

Join Us as a Local Community Builder!

Unexpected Schema ID Folder Creation in Unity Cata...

PipelineSpec object does not seem to show event_lo...

delta live tables

readStream with readChangeFeed option in SQL

Understanding High I/O Wait Despite High CPU Utili...