Data Engineering

Forum Posts

Sorted by:

by bbastian • New Contributor

10-02-2025 11:26:00 AM

208 Views
1 replies
1 kudos

Resolved! [VARIANT_SIZE_LIMIT] Cannot build variant bigger than 16.0 MiB in parse_json

I have a table coming from postgreSql, with one column containing json data in string format. We have been using parse_json to convert that to a vraiant column. But lately it is failing with the SIZE_LIMIT error. When I isolated the row which gave er...

Data Engineering

208 Views
1 replies
1 kudos

10-02-2025 11:26:00 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

10-02-2025 12:55:58 PM

1 kudos

Hi @bbastian ,Unfortunately, as of now there is strict limitation regarding size - a variant column cannot contain a value larger than 16 MiB.Variant support in Delta Lake | Databricks on AWSAnd tbh you cannot compare the size of this json string to ...

1 kudos

10-02-2025 12:55:58 PM

by sslyle • New Contributor III

02-06-2022 6:25:13 AM

7687 Views
9 replies
5 kudos

Resolved! Combining multiple Academy profiles

I have this profile @gmail.com; my personal professional profile.I also have a @mycompany.com profile.How do I combine both so I can leave my current job for a better life without losing the accolades I'm accumulated under my @mycompany.com login giv...

Data Engineering

7687 Views
9 replies
5 kudos

02-06-2022 6:25:13 AM

View Replies

Latest Reply

jChantoHdz
New Contributor II

10-02-2025 12:44:09 PM

5 kudos

I have the same question, how this can be done?

5 kudos

10-02-2025 12:44:09 PM

8 More Replies

by gzr58l • New Contributor

10-01-2025 12:29:16 PM

170 Views
1 replies
0 kudos

How to setup lakeflow HTTP for connector with M2M Authentication

I am getting the following error about content-type with no option to pick a different content-type when configuring the lakeflow connectorThe OAuth token exchange failed with HTTP status code 415 Unsupported Media Type. The returned server response ...

Data Engineering

170 Views
1 replies
0 kudos

10-01-2025 12:29:16 PM

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

10-02-2025 1:45:21 AM

0 kudos

Hi @gzr58l are you configuring a custom Lakeflow connector or external connection in Databricks? Also, consider using a service principal or personal access token (PAT) for authentication as a temporary workaround.

0 kudos

10-02-2025 1:45:21 AM

by JeffSeaman • New Contributor III

09-08-2025 1:25:10 PM

679 Views
8 replies
1 kudos

Resolved! JDCB Error trying a get schemas call.

Hi Community,I have a free demo version and can create a jdbc connection and get metadata (schema, table, and columns structure info). Everything works as described in the docs, but when working with someone who has a paid version of databricks the s...

Data Engineering

679 Views
8 replies
1 kudos

09-08-2025 1:25:10 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

10-01-2025 7:45:10 AM

1 kudos

@JeffSeaman , please let us know if any of my suggestions help get you on the right track. If they do, kindly mark the post as "Accepted Solution" so others can benefit as well. Cheers, Louis.

1 kudos

10-01-2025 7:45:10 AM

7 More Replies

by jakesippy • New Contributor III

09-23-2025 4:09:05 PM

737 Views
7 replies
14 kudos

Resolved! How to get pipeline update duration programmatically

I'm looking to track how much time is being spent running updates for my DLT pipelines.When querying the list pipeline updates REST API endpoint I can see start and end times being returned, however, these fields are not listed in the documentation. ...

Data Engineering

737 Views
7 replies
14 kudos

09-23-2025 4:09:05 PM

View Replies

Latest Reply

jakesippy
New Contributor III

10-01-2025 12:15:54 PM

14 kudos

Originally went with the approach of exporting to and reading from the event log table, which has been helpful for getting other metrics as well.Also found today that there is a new system table is in public preview which exposes the durations I was ...

14 kudos

10-01-2025 12:15:54 PM

6 More Replies

by VIRALKUMAR • Contributor II

08-27-2024 8:36:37 AM

9641 Views
5 replies
0 kudos

How to Determine the Cost for Each Query Run Against SQL Warehouse Serverless?

Hello Everyone.First of all, I would like to thank you to databricks to enable system tables for customers. It does help a lot. I am working on cost optimization topic. Particularly sql warehouse serverless. I am not sure all of you have tried system...

Data Engineering

9641 Views
5 replies
0 kudos

08-27-2024 8:36:37 AM

View Replies

Latest Reply

skumarraj
New Contributor II

10-01-2025 10:21:20 AM

0 kudos

Can you Share the query that you used ?

0 kudos

10-01-2025 10:21:20 AM

4 More Replies

by aranjan99 • Contributor

09-25-2025 4:27:35 PM

555 Views
4 replies
2 kudos

Databricks Pipeline SDK misisng fields

Looking at the Databricks Java SDK for Pipeline events, I see that the Rest API returns a details field that has the same information as event log details. But this is not surfaced in SDK, should be a small change to add it. Is that something which c...

Data Engineering

555 Views
4 replies
2 kudos

09-25-2025 4:27:35 PM

View Replies

Latest Reply

ManojkMohan
Honored Contributor

09-26-2025 5:51:43 AM

2 kudos

The start and end time fields in the Pipeline Updates API are currently present in the Databricks REST API but are not yet supported (i.e., not included or mapped) in the Databricks Java SDK as of September 2025.This means:You can see these fields (s...

2 kudos

09-26-2025 5:51:43 AM

3 More Replies

by manish24101981 • New Contributor

06-22-2025 1:14:45 AM

1779 Views
1 replies
1 kudos

Resolved! DLT or DataBricks for CDC and NRT

We are currently delivering a large-scale healthcare data migration project involving:One-time historical migration of approx. 80 TB of data, already completed and loaded into Delta Lake.CDC merge logic is already developed and validated using Apache...

Data Engineering

1779 Views
1 replies
1 kudos

06-22-2025 1:14:45 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

10-01-2025 6:40:10 AM

1 kudos

For cost-sensitive, large-scale healthcare data streaming scenarios, using Delta Live Tables (DLT) for both CDC and streaming (Option C) is generally the most scalable, manageable, and cost-optimized approach. DLT offers native support for structured...

1 kudos

10-01-2025 6:40:10 AM

by allancruz • New Contributor

06-23-2025 6:58:28 PM

1820 Views
1 replies
1 kudos

Resolved! Embedding Dashboards on Databricks Apps

Hi Team,I recently tried the Hello World template and embedded the <iframe> from the dashboard that I created. It works properly fine before I added some code to have a Login Form (I used Dash Plotly on creating the Login Form) before the dashboard a...

Data Engineering

1820 Views
1 replies
1 kudos

06-23-2025 6:58:28 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

10-01-2025 6:29:00 AM

1 kudos

in databricks, I recently tried the Hello World template and embedded the <iframe> from the dashboard that I created. It works properly fine before I added some code to have a Login Form (I used Dash Plotly on creating the Login Form) before the...

1 kudos

10-01-2025 6:29:00 AM

by databricksdata • New Contributor

06-26-2025 1:36:54 AM

1653 Views
1 replies
1 kudos

Resolved! Assistance Required with Auto Liquid Clustering Implementation Challenges

Hi Databricks Team,We are currently implementing Auto Liquid Clustering (ALC) on our Delta tables as part of our data optimization efforts. During this process, we have encountered several challenges and would appreciate your guidance on best practic...

Data Engineering

1653 Views
1 replies
1 kudos

06-26-2025 1:36:54 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

10-01-2025 6:27:19 AM

1 kudos

To implement Auto Liquid Clustering (ALC) on Delta tables in Databricks, especially when transitioning from external partitioned tables to unpartitioned managed tables, a careful and ordered process is crucial to avoid data duplication and ensure con...

1 kudos

10-01-2025 6:27:19 AM

by saicharandeepb • New Contributor III

06-24-2025 11:39:32 PM

1901 Views
1 replies
1 kudos

Resolved! Understanding High I/O Wait Despite High CPU Utilization in system.compute Metrics

Hi everyone,I'm working on building a hardware metrics dashboard using the system.compute schema in Databricks, specifically leveraging the cluster, node_type, and node_timeline tables.While analyzing the data, I came across something that seems cont...

Data Engineering

1901 Views
1 replies
1 kudos

06-24-2025 11:39:32 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

10-01-2025 6:23:21 AM

1 kudos

Your observation highlights a subtlety in interpreting CPU metrics, especially in distributed environments like Databricks, where cluster and node-level behaviors can diverge from typical single-server intuition. Direct Answer No, seeing both high cp...

1 kudos

10-01-2025 6:23:21 AM

by piotrsofts • New Contributor III

06-25-2025 8:09:30 AM

1722 Views
1 replies
0 kudos

Resolved! LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

Hello While creating new Data Ingestion from GA4, can we set-up Liquid Clustering (either Manual or Automatical) on destination table which will contain fetched data from GA4?

Data Engineering

1722 Views
1 replies
0 kudos

06-25-2025 8:09:30 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

10-01-2025 6:21:27 AM

0 kudos

Yes, in Databricks, it is possible to set up Liquid Clustering—both manual and automatic—on destination tables that store data ingested from Google Analytics 4 (GA4). This feature significantly improves table management and query performance compared...

0 kudos

10-01-2025 6:21:27 AM

by IONA • New Contributor III

09-30-2025 6:49:15 AM

179 Views
1 replies
0 kudos

Dabs Databricks asset bundles

Hi!I am relatively new to Dabs, but getting on quite well.I have managed to deploy both a job that uses a notebook defined in the bundle itself and a job that points to a notebook living in an azure devops git repo. While these are two viable solutio...

Data Engineering

179 Views
1 replies
0 kudos

09-30-2025 6:49:15 AM

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

10-01-2025 2:39:43 AM

0 kudos

Hi @IONA ,you need to add a step into your CD pipeline to copy the notebook:- checkout: self- script: |cp path/to/notebook_in_repo/notebook.py .bundle/notebook.pydisplayName: 'Copy notebook into bundle'- script: |databricks bundle deploydisplayName: ...

0 kudos

10-01-2025 2:39:43 AM

by BenLambert • Contributor

04-28-2023 2:24:57 AM

3910 Views
2 replies
0 kudos

How to deal with deleted files in source directory in DLT?

We have a DLT pipeline that uses the autoloader to detect files added to a source storage bucket. It reads these updated files and adds new records to a bronze streaming table. However we would also like to automatically delete records from the bronz...

Data Engineering

3910 Views
2 replies
0 kudos

04-28-2023 2:24:57 AM

View Replies

Latest Reply

boitumelodikoko
Valued Contributor

09-30-2025 12:22:30 PM

0 kudos

I am looking for a solution to use with DLTs

0 kudos

09-30-2025 12:22:30 PM

1 More Replies

by shashankB • New Contributor III

09-30-2025 5:37:53 AM

329 Views
2 replies
4 kudos

Resolved! How to invoke Databricks AI Assistant from a notebook cell?

Hello Community,I am exploring the Databricks AI Assistant and wondering if there is a way to invoke or interact with it directly from a notebook cell instead of using the workspace sidebar UI.Is there any built-in command (like %assistant) to open o...

Data Engineering

329 Views
2 replies
4 kudos

09-30-2025 5:37:53 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

09-30-2025 10:21:28 AM

4 kudos

@shashankB There are no command like %assistant exists today to interact with Databricks Assistant. As @szymon_dybczak mentioned in the reply the exiting modes that you can interact with Assistant today.Also there is no published Assistant‑specific ...

4 kudos

09-30-2025 10:21:28 AM

1 More Replies

Databricks Community

Forum Posts

Resolved! [VARIANT_SIZE_LIMIT] Cannot build variant bigger than 16.0 MiB in parse_json

Resolved! Combining multiple Academy profiles

How to setup lakeflow HTTP for connector with M2M Authentication

Resolved! JDCB Error trying a get schemas call.

Resolved! How to get pipeline update duration programmatically

How to Determine the Cost for Each Query Run Against SQL Warehouse Serverless?

Databricks Pipeline SDK misisng fields

Resolved! DLT or DataBricks for CDC and NRT

Resolved! Embedding Dashboards on Databricks Apps

Resolved! Assistance Required with Auto Liquid Clustering Implementation Challenges

Resolved! Understanding High I/O Wait Despite High CPU Utilization in system.compute Metrics

Resolved! LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

Dabs Databricks asset bundles

How to deal with deleted files in source directory in DLT?

Resolved! How to invoke Databricks AI Assistant from a notebook cell?

Join Us as a Local Community Builder!

transformWithStateInPandas. Invalid pickle opcode ...

Course - Introduction to Apache Spark

DLT | Communication lost with driver | Cluster was...

Broken s3 file paths in File Notifications for aut...

Reading empty json file in serverless gives error

Resolved! [VARIANT_SIZE_LIMIT] Cannot build variant bigger than 16.0 MiB in parse_json

Resolved! Combining multiple Academy profiles

Resolved! JDCB Error trying a get schemas call.

Resolved! How to get pipeline update duration programmatically

Resolved! DLT or DataBricks for CDC and NRT

Resolved! Embedding Dashboards on Databricks Apps

Resolved! Assistance Required with Auto Liquid Clustering Implementation Challenges

Resolved! Understanding High I/O Wait Despite High CPU Utilization in system.compute Metrics

Resolved! LakeFlow Connect-&gt;GA4 - creation of Liquid Clustered stream table

Resolved! How to invoke Databricks AI Assistant from a notebook cell?

Join Us as a Local Community Builder!

Resolved! LakeFlow Connect->GA4 - creation of Liquid Clustered stream table