Data Engineering

Forum Posts

Sorted by:

by Maatari • New Contributor III

08-13-2024 5:53:47 AM

4118 Views
1 replies
0 kudos

Chaining stateful Operator

I would like to do a groupby followed by a join in structured streaming. I would read from from two delta table in snapshot mode i.e. latest snapshot.My question is specifically about chaining the stateful operator. groupby is update modechaning grou...

Data Engineering

4118 Views
1 replies
0 kudos

08-13-2024 5:53:47 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:46:13 AM

0 kudos

When chaining stateful operators like groupBy (aggregation) and join in Spark Structured Streaming, there are specific rules about the output mode required for the overall query and the behavior of each operator. Output Mode Requirements The groupBy...

0 kudos

11-17-2025 3:46:13 AM

by jmeidam • Databricks Partner

08-16-2024 3:36:44 AM

4923 Views
2 replies
0 kudos

Displaying job-run progress when submitting jobs via databricks-sdk

When I run notebooks from within a notebook using `dbutils.notebook.run`, I see a nice progress table that updates automatically, showing the execution time, the status, links to the notebook and it is seamless.My goal now is to execute many notebook...

Data Engineering

4923 Views
2 replies
0 kudos

08-16-2024 3:36:44 AM

View Replies

Latest Reply

Coffee77
Honored Contributor II

11-17-2025 3:46:08 AM

0 kudos

All good in @mark_ott response. As a potential improvement, instead of using polling, I think it would be better to publish events to a Bus (i.e. Azure Event Hub) from notebooks so that consumers could launch queries when receiving, processing and fi...

0 kudos

11-17-2025 3:46:08 AM

1 More Replies

by Maatari • New Contributor III

08-13-2024 5:56:59 AM

4483 Views
1 replies
0 kudos

Readying a partitioned Table in Spark Structured Streaming

Does the pre-partitioning of a Delta Table has an influence on the number of "default" Partition of a Dataframe when readying the data?Put differently, using spark structured streaming, when readying from a delta table, is the number of Dataframe par...

Data Engineering

4483 Views
1 replies
0 kudos

08-13-2024 5:56:59 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:43:48 AM

0 kudos

Pre-partitioning of a Delta Table does not strictly determine the number of "default" DataFrame partitions when reading data with Spark Structured Streaming. Unlike Kafka, where each DataFrame partition maps one-to-one to a Kafka partition, Delta Lak...

0 kudos

11-17-2025 3:43:48 AM

by c-thiel • New Contributor

08-13-2024 7:42:44 AM

4324 Views
1 replies
0 kudos

APPLY INTO Highdate instead of NULL for __END_AT

I really like the APPLY INTO function to keep track of changes and historize them in SCD2.However, I am a bit confused that current records get an __END_AT of NULL. Typically, __END_AT should be a highgate (i.e. 9999-12-31) or similar, so that a poin...

Data Engineering

4324 Views
1 replies
0 kudos

08-13-2024 7:42:44 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:41:57 AM

0 kudos

The APPLY INTO function for SCD2 historization typically sets the __END_AT field of current records to NULL rather than a "highgate" like 9999-12-31. This is by design and reflects that the record is still current and has no defined end date yet. Cur...

0 kudos

11-17-2025 3:41:57 AM

by NiraliGandhi • New Contributor

08-13-2024 8:22:57 AM

4972 Views
1 replies
0 kudos

Pyspark - alias is not applied in pivot if only one aggregation

This is not making it consistent when we perform aggregation on multiple columns and thus it is hindering metadata driven transformation because of inconsistency.How can we request Databricks/pyspark to include this ? and is there any known work arou...

Data Engineering

4972 Views
1 replies
0 kudos

08-13-2024 8:22:57 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:40:23 AM

0 kudos

When using PySpark or Databricks to perform a pivot operation with only a single aggregation, you may notice that the alias is not applied as expected, leading to inconsistencies, especially when trying to automate or apply metadata-driven frameworks...

0 kudos

11-17-2025 3:40:23 AM

by novytskyi • New Contributor

08-14-2024 7:24:15 AM

4364 Views
1 replies
0 kudos

Timeout for dbutils.jobs.taskValues.set(key, value)

I have a job that call notebook with dbutils.jobs.taskValues.set(key, value) method and assigns around 20 parameters.When I run it - it works.But when I try to call 2 or more copies of a job with different parameters - it fails with error on differen...

Data Engineering

4364 Views
1 replies
0 kudos

08-14-2024 7:24:15 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:38:48 AM

0 kudos

The error you are encountering when running multiple simultaneous Databricks jobs using dbutils.jobs.taskValues.set(key, value) indicates a connection timeout issue to the Databricks backend API (connect timed out at ...us-central1.gcp.databricks.com...

0 kudos

11-17-2025 3:38:48 AM

by SebastianCar28 • New Contributor

08-15-2024 11:37:20 AM

4584 Views
1 replies
0 kudos

How to implement Lifecycle of Data When Use ADLS

Hello everyone, nice to greet you. I have a question about the data lifecycle in ADLS. I know ADLS has its own rules, but they aren't working properly because I have two ADLS accounts: one for hot data and another for cool storage where the informati...

Data Engineering

4584 Views
1 replies
0 kudos

08-15-2024 11:37:20 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:35:46 AM

0 kudos

Yes, you can move data from your HOT ADLS account to a COOL ADLS account while handling Delta Lake log issues, but this requires special techniques due to the nature of Delta Lake’s transaction log. The problem stems from Delta tables’ dependency on ...

0 kudos

11-17-2025 3:35:46 AM

by SrinuM • New Contributor III

08-22-2024 1:38:17 PM

4548 Views
1 replies
0 kudos

Workspace Client dbutils issue

host = "https://adb-xxxxxx.xx.azuredatabricks.net"token = "dapxxxxxxx"we are using databricksconnect from databricks.sdk import WorkspaceClientdbutil = WorkspaceClient(host=host,token=token).dbutilsfiles = dbutil.fs.ls("abfss://container-name@storag...

Data Engineering

4548 Views
1 replies
0 kudos

08-22-2024 1:38:17 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:28:03 AM

0 kudos

The error where files and directories can be read at the root ADLS level but not at the blob/subdirectory level, combined with a "No file or directory exists on path" message, is frequently due to permission configuration, incorrect path usage, or ne...

0 kudos

11-17-2025 3:28:03 AM

by Anshul_DBX • New Contributor

08-20-2024 3:54:14 AM

4507 Views
1 replies
0 kudos

Executing Stored Procedures/update in Federated SQL Server

I have federated Azure SQL DB in my DBX workspace, but I am not able to run update commands or execute a stored procedure, is this still not supported?

Data Engineering

4507 Views
1 replies
0 kudos

08-20-2024 3:54:14 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-17-2025 3:26:25 AM

0 kudos

Federated connections from Azure Databricks to Azure SQL DB via Lakehouse Federation currently only support read-only queries—meaning running update commands or executing stored procedures directly through the federated Unity Catalog interface is not...

0 kudos

11-17-2025 3:26:25 AM

by vamsi_simbus • Databricks Partner

11-16-2025 11:07:16 PM

1012 Views
1 replies
0 kudos

System tables for DLT Expectations Quality Metrics

Hi Everyone,I’m working with Delta Live Tables (DLT) and using Expectations to track data quality, but I’m having trouble finding where the expectation quality metrics are stored in the DLT system tables.My questions are:Which specific system table(s...

Data Engineering

1012 Views
1 replies
0 kudos

11-16-2025 11:07:16 PM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

11-16-2025 11:56:04 PM

0 kudos

@vamsi_simbus DLT captures data quality metrics in specialized system tables known as “event” and “metrics” tables. Specifically, look in the following table:LIVE.DLT_EVENT_LOG or LIVE.DLT_METRICS: These tables contain granular event logs and metric...

0 kudos

11-16-2025 11:56:04 PM

by Suheb • Contributor

11-16-2025 10:39:52 PM

661 Views
1 replies
0 kudos

Resolved! What are best practices for designing a large-scale data engineering pipeline on Databricks for real

How do you design a scalable, reliable pipeline that handles both fast/continuous data and slower bulk data in the same system?

Data Engineering

661 Views
1 replies
0 kudos

11-16-2025 10:39:52 PM

View Replies

Latest Reply

Coffee77
Honored Contributor II

11-16-2025 11:52:01 PM

0 kudos

Very generic question Here are general rules and best practices related to Databricks well-architected framework: https://docs.databricks.com/aws/en/lakehouse-architecture/well-architected Take a deeper look on operational excellence, reliability an...

0 kudos

11-16-2025 11:52:01 PM

by aravind-ey • Databricks Partner

02-27-2025 5:00:17 PM

26271 Views
23 replies
6 kudos

vocareum lab access

Hi I am doing a data engineering course in databricks(Partner labs) and would like to have access to vocareum workspace to practice using the demo sessions.can you please help me to get the access to this workspace?regards,Aravind

Data Engineering

26271 Views
23 replies
6 kudos

02-27-2025 5:00:17 PM

View Replies

Latest Reply

Eicke
Databricks Partner

10-22-2025 7:35:45 AM

6 kudos

You can log into databricks, search for "Canada Sales" in the Marketplace and find "Simulated Canada Sales and Opportunities Data". Get free instant access, wait a few seconds for the warehouse to be built for you et voila: the tables for building th...

6 kudos

10-22-2025 7:35:45 AM

22 More Replies

by sunnyday • New Contributor

08-23-2024 2:42:12 AM

6308 Views
1 replies
0 kudos

Naming jobs in the Spark UI in Databricks Runtime 15.4

I am asking almost the same question as: https://community.databricks.com/t5/data-engineering/how-to-improve-spark-ui-job-description-for-pyspark/td-p/48959 . I would like to know how to improve the readability of the Spark UI by naming jobs. I am...

Data Engineering

6308 Views
1 replies
0 kudos

08-23-2024 2:42:12 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-16-2025 9:51:47 AM

0 kudos

You are correct—on Databricks Runtime 15.4 and with shared clusters (or clusters enabled with Unity Catalog), you will see the [JVM_ATTRIBUTE_NOT_SUPPORTED] error when trying to directly access sparkContext attributes that are only available in singl...

0 kudos

11-16-2025 9:51:47 AM

by Vishnu_9959 • New Contributor

08-21-2024 10:21:10 PM

4253 Views
1 replies
0 kudos

Can we develop a connector that integrates nintex and Databricks community version

Can we develop a connector that integrates nintex and Databricks with community version

Data Engineering

4253 Views
1 replies
0 kudos

08-21-2024 10:21:10 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-16-2025 9:50:08 AM

0 kudos

It is technically possible to develop a connector that integrates Nintex with Databricks, but there are important limitations when trying to achieve this with Databricks Community Edition. Connector Integration Overview Nintex can be integrated with...

0 kudos

11-16-2025 9:50:08 AM

by rpilli • New Contributor

08-25-2024 2:35:47 PM

5124 Views
1 replies
0 kudos

Conditional Execution in DLT Pipeline based on the output

Hello ,I'm working on a Delta Live Tables (DLT) pipeline where I need to implement a conditional step that only triggers under specific conditions. Here's the challenge I'm facing:I have a function that checks if the data meets certain thresholds. If...

Data Engineering

5124 Views
1 replies
0 kudos

08-25-2024 2:35:47 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

11-16-2025 9:48:23 AM

0 kudos

In Delta Live Tables (DLT), native conditional or branch-based control flow is limited; all table/stream definitions declared in your pipeline will execute, and dependencies are handled via @Dlt.table or @Dlt.view decorators. You can’t dynamically sk...

0 kudos

11-16-2025 9:48:23 AM

Databricks Community

Forum Posts

Chaining stateful Operator

Displaying job-run progress when submitting jobs via databricks-sdk

Readying a partitioned Table in Spark Structured Streaming

APPLY INTO Highdate instead of NULL for __END_AT

Pyspark - alias is not applied in pivot if only one aggregation

Timeout for dbutils.jobs.taskValues.set(key, value)

How to implement Lifecycle of Data When Use ADLS

Workspace Client dbutils issue

Executing Stored Procedures/update in Federated SQL Server

System tables for DLT Expectations Quality Metrics

Resolved! What are best practices for designing a large-scale data engineering pipeline on Databricks for real

vocareum lab access

Naming jobs in the Spark UI in Databricks Runtime 15.4

Can we develop a connector that integrates nintex and Databricks community version

Conditional Execution in DLT Pipeline based on the output

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template