Databricks Community

dbhavesh · ‎02-10-2025

Hi all,

how to use row_num in DLT or What is the alternative for row_num function in DLT.

We are looking for same functionality which row num is doing.

Thanks in advance.

Takuya-Omi · ‎02-11-2025

In DLT, you can achieve similar functionality to the ROW_NUMBER() function in SQL by using the ROW_NUMBER() window function within your DLT pipeline. This can be done using PySpark or SQL syntax within your DLT pipeline code.

CREATE MATERIALIZED VIEW bronze_dlt AS

SELECT

*,

ROW_NUMBER() OVER (ORDER BY column1) AS row_number

FROM

test_wk.default.source_table

--------------------------
Takuya Omi (尾美拓哉)

dbhavesh · ‎02-11-2025

Hi TakuyaOmi, thanks for your response.

I did try that out, but receiving this kind of error as shown in the image below:

Please let me know your thoughts.

Thanks in advance!

Takuya-Omi · ‎02-15-2025

@dbhavesh

I apologize for the lack of explanation.

The ROW_NUMBER function requires ordering over the entire dataset, making it a non-time-based window function. When applied to streaming data, it results in the "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING" error.

This issue occurs specifically in DLT streaming tables because they continuously process incoming data. However, in the case of materialized views, data is processed as a snapshot at a given point in time, allowing ordering without triggering this error.

If you need to generate sequential numbers, consider either:

Using a materialized view instead of a streaming table, or
Defining an IDENTITY column in the table schema, which automatically assigns unique sequential numbers upon data insertion.*

* Databricks Documentation – Identity Columns in Delta Lake

--------------------------
Takuya Omi (尾美拓哉)