Data Engineering

Forum Posts

Sorted by:

by DaPo • New Contributor III

03-19-2025 10:30:13 AM

4760 Views
2 replies
2 kudos

Resolved! DLT Streaming With Watermark fails, suggesting I should add watermarks

Hi all,I have the following Problem: I have two streaming tables containing time-series measurements from different sensor data, each feed by multiple sensors. (Imagine: Multiple Temperature Sensors for the first table, and multiple humidity sensors ...

Data Engineering

4760 Views
2 replies
2 kudos

03-19-2025 10:30:13 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:35:55 AM

2 kudos

To resolve the DLT streaming aggregation error about unsupported output modes and watermarks in Databricks, you need to carefully set watermarks on the original event timestamp rather than on computed columns like "time_window" and carefully consider...

2 kudos

11-05-2025 4:35:55 AM

1 More Replies

by Dave_Nithio • Contributor II

03-11-2025 11:40:40 AM

4659 Views
1 replies
0 kudos

Transaction Log Failed Integrity Checks

I have started to receive the following error message - that the transaction log has failed integrity checks - when attempting to optimize and run compaction on a table. It also occurs when I attempt to alter this table.This blocks my pipeline from r...

Data Engineering

4659 Views
1 replies
0 kudos

03-11-2025 11:40:40 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:52:27 AM

0 kudos

Your issue—encountering "the transaction log has failed integrity checks" in Databricks Delta Lake—indicates metadata corruption or an inconsistency in the Delta transaction log (_delta_log). This commonly disrupts DML operations like OPTIMIZE, DELET...

0 kudos

11-05-2025 4:52:27 AM

by OmarE • New Contributor II

02-19-2025 6:03:41 PM

4612 Views
1 replies
2 kudos

Streamlit Databricks App Compute Scaling

I have a streamlit Databricks app and I’m looking to increase the compute resources. According to the documentation and the current settings, the app is limited to 2 vCPUs and 6 GB of memory. Is there a way to adjust these limits or add more resource...

Data Engineering

4612 Views
1 replies
2 kudos

02-19-2025 6:03:41 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:50:40 AM

2 kudos

You can increase compute resources for your Streamlit Databricks app, but this requires explicitly configuring the compute size in the Databricks app management UI or via deployment configuration—environment variables like DATABRICKS_CLUSTER_ID alone...

2 kudos

11-05-2025 4:50:40 AM

by Arunraja • Databricks Partner

02-18-2025 9:14:13 AM

4210 Views
1 replies
0 kudos

AI BI Genie throwing internal error

For any prompt I am getting INTERNAL_ERROR: AI service did not respond with a valid answer

Data Engineering

4210 Views
1 replies
0 kudos

02-18-2025 9:14:13 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:49:21 AM

0 kudos

The "INTERNAL_ERROR: AI service did not respond with a valid answer" in Databricks AI/BI Genie typically means the Genie service failed to process your query, often due to one of a few common issues. This can include problems with the table existence...

0 kudos

11-05-2025 4:49:21 AM

by turagittech • Contributor

02-16-2025 10:19:49 PM

4819 Views
1 replies
1 kudos

Finding all folder paths in a blob store connected via UC external connetion

Hi All,I need to easily find all the paths in a blob store to find the files and load them. I have tried using Azure Blob storage connection in python and I have a solution that works it is very slow. I was speaking to a data engineer, and he suggest...

Data Engineering

4819 Views
1 replies
1 kudos

02-16-2025 10:19:49 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:48:14 AM

1 kudos

The most efficient way to list all file paths in an Azure Blob Storage container from Databricks, especially when Hierarchical Namespace (HNS) is not enabled, is to use Azure SDKs targeting the blob flat namespace directly rather than filesystem prot...

1 kudos

11-05-2025 4:48:14 AM

by Sega2 • New Contributor III

02-21-2025 2:55:11 AM

4491 Views
2 replies
1 kudos

Debugger freezes when calling spark.sql with dbx connect

I have just created a simple bundle with databricks, and is using Databricks connect to debug locally. This is my script:from pyspark.sql import SparkSession, DataFrame def get_taxis(spark: SparkSession) -> DataFrame: return spark.read.table("samp...

Data Engineering

4491 Views
2 replies
1 kudos

02-21-2025 2:55:11 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:46:52 AM

1 kudos

The issue you're experiencing—where your script freezes in VS Code when running spark.sql locally using Databricks Connect, but works correctly when deployed—can result from several common causes related to Databricks Connect configuration, networkin...

1 kudos

11-05-2025 4:46:52 AM

1 More Replies

by akshaym0056 • New Contributor

02-12-2025 6:42:29 AM

4491 Views
2 replies
0 kudos

How to Define Constants at Bundle Level in Databricks Asset Bundles for Use in Notebooks?

I'm working with Databricks Asset Bundles and need to define constants at the bundle level based on the target environment. These constants will be used inside Databricks notebooks.For example, I want a constant gold_catalog to take different values ...

Data Engineering

4491 Views
2 replies
0 kudos

02-12-2025 6:42:29 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:45:39 AM

0 kudos

Yes, you can define environment-specific constants at the bundle level in Databricks Asset Bundles and make them accessible inside Databricks notebooks, without relying on task-level parameters. This can be done using environment variables, bundle co...

0 kudos

11-05-2025 4:45:39 AM

1 More Replies

by Databricks36 • New Contributor

02-27-2025 9:31:40 AM

4678 Views
1 replies
0 kudos

Accessing Databricks Delta table in ADF using system-defined managed identity

I am using Lookup activity in ADF which will read the delta table values from databricks. Currently using the system-defined managed identity of the ADF to connect Databricks delta table. I am unable to see my unity catalog database names in the look...

Data Engineering

4678 Views
1 replies
0 kudos

02-27-2025 9:31:40 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:43:57 AM

0 kudos

You are experiencing an issue in Azure Data Factory (ADF) where the Lookup activity does not show your Unity Catalog databases in the configuration dropdown, even though connectivity from ADF to Databricks is successful and you have followed all reco...

0 kudos

11-05-2025 4:43:57 AM

by jordanpinder • New Contributor

02-28-2025 6:51:13 AM

4819 Views
1 replies
0 kudos

Native geometry Parquet support

Hi there!With the recent GeoParquet 2.0 announcements, I'm curious to understand how this impacts storing geospatial data in Databricks and Delta. For reference:the Parquet specification officially adopting geospatial guidance allowing native storage...

Data Engineering

4819 Views
1 replies
0 kudos

02-28-2025 6:51:13 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:43:03 AM

0 kudos

GeoParquet 2.0’s formalization within the Apache Parquet specification is a significant step for native geospatial data storage across the modern data ecosystem, particularly for platforms like Databricks and Delta Lake. In summary, Delta Lake's reli...

0 kudos

11-05-2025 4:43:03 AM

by Dave_Nithio • Contributor II

03-18-2025 1:27:14 PM

3793 Views
1 replies
0 kudos

Preset Partner Connect Schema Changes

When using partner connect to connect Serverless Databricks to my BI tool Preset, you must manually define the schema that Preset has access to. In my case, I individually selected all databases currently in my hive_metastore:The problem is, once cre...

Data Engineering

3793 Views
1 replies
0 kudos

03-18-2025 1:27:14 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:41:42 AM

0 kudos

No, there is currently no simple, direct way to add new schema access to an existing Serverless Databricks SQL warehouse connection through Partner Connect for Preset—neither through Databricks UI, BI tool configuration, nor the Databricks service pr...

0 kudos

11-05-2025 4:41:42 AM

by fscaravelli • New Contributor

03-13-2025 11:18:57 AM

4328 Views
1 replies
0 kudos

Ingest files from GCS with Auto Loader in DLT pipeline running on AWS

I have some DLT pipelines working fine ingesting files from S3. Now I'm trying to build a pipeline to ingest files from GCS using Auto Loader. I'm running Databricks on AWS.The code I have:import dlt import json from pyspark.sql.functions import col ...

Data Engineering

4328 Views
1 replies
0 kudos

03-13-2025 11:18:57 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:40:27 AM

0 kudos

Your error is due to how Databricks on AWS is trying to access GCS: it's defaulting to using the GCP metadata server (which only exists on Google Cloud VMs), not the service account key you provided. This is a common issue when connecting GCS from no...

0 kudos

11-05-2025 4:40:27 AM

by kulasangar • New Contributor II

03-18-2025 11:20:19 AM

3986 Views
1 replies
0 kudos

Permission Denied while trying to update a yaml file within a python project in Databricks

I have a python project and within that I do have a yaml file. Currently i'm building the project using poetry and creating an asset bundle to deploy it in Databricks as a workflow job.So when the workflow runs, I do have an __init__.py within my ent...

Data Engineering

3986 Views
1 replies
0 kudos

03-18-2025 11:20:19 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:38:57 AM

0 kudos

The main issue is that Databricks jobs typically run in environments where the file system may be read-only or restricted—especially for files packaged within the asset bundle or inside locations like /databricks/driver, /databricks/conda, or other s...

0 kudos

11-05-2025 4:38:57 AM

by antoniomf • New Contributor

03-14-2025 6:06:56 AM

4140 Views
1 replies
0 kudos

Bug Delta Live Tables - Checkpoint

Hello, I've encountered an issue with Delta Live Table in both my Development and Production Workspaces. The data is arriving correctly in my Azure Storage Account; however, the checkpoint is being stored in the path dbfs:/. I haven't modified the St...

Data Engineering

4140 Views
1 replies
0 kudos

03-14-2025 6:06:56 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-05-2025 4:37:11 AM

0 kudos

There appears to be a recurring issue with Delta Live Table (DLT) pipelines in Databricks where the checkpoint is unexpectedly stored in the dbfs:/ path, rather than in the intended external storage location (such as Azure Blob Storage or ADLS). This...

0 kudos

11-05-2025 4:37:11 AM

by max_eg • New Contributor II

11-05-2025 2:06:46 AM

1288 Views
1 replies
1 kudos

Resolved! Bug in Asset Bundle Sync

I think I found a bug in the way asset bundles sync/ deploy, or at least I have a question if I understood it correctly. My Setup:I have an asset bundle, consisting of a notebook nb1.py and a utils module utils.py.nb1.py imports functions from utils....

Data Engineering

1288 Views
1 replies
1 kudos

11-05-2025 2:06:46 AM

View Replies

Latest Reply

bianca_unifeye
Databricks MVP

11-05-2025 2:56:20 AM

1 kudos

Hi @max_egWhat you’re seeing is expected with Asset Bundles.databricks bundle deploy computes what changed locally and only uploads those files. If you edited nb1.py in the workspace (not locally), the deploy won’t “see” a local delta for that file, ...

1 kudos

11-05-2025 2:56:20 AM

by sergecom • New Contributor III

10-28-2025 1:57:33 PM

1087 Views
2 replies
1 kudos

Resolved! Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

Hi,We’re currently running a Databricks installation with an on-premises HDFS file system. As we’re looking to adopt Unity Catalog, we’ve realized that our current HDFS setup has limited support and compatibility with Unity Catalog.Our requirement: W...

Data Engineering

1087 Views
2 replies
1 kudos

10-28-2025 1:57:33 PM

View Replies

Latest Reply

sergecom
New Contributor III

11-05-2025 1:01:15 AM

1 kudos

Thanks very much for your detailed response — this is really helpful.You mentioned client cases where organizations have migrated from on-premises HDFS into the Databricks Unity Catalog, I’d love to learn more about those.If possible, could you share...

1 kudos

11-05-2025 1:01:15 AM

1 More Replies

Databricks Community

Forum Posts

Resolved! DLT Streaming With Watermark fails, suggesting I should add watermarks

Transaction Log Failed Integrity Checks

Streamlit Databricks App Compute Scaling

AI BI Genie throwing internal error

Finding all folder paths in a blob store connected via UC external connetion

Debugger freezes when calling spark.sql with dbx connect

How to Define Constants at Bundle Level in Databricks Asset Bundles for Use in Notebooks?

Accessing Databricks Delta table in ADF using system-defined managed identity

Native geometry Parquet support

Preset Partner Connect Schema Changes

Ingest files from GCS with Auto Loader in DLT pipeline running on AWS

Permission Denied while trying to update a yaml file within a python project in Databricks

Bug Delta Live Tables - Checkpoint

Resolved! Bug in Asset Bundle Sync

Resolved! Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template