cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta live tables straming

Rishitha
New Contributor III

I'm trying to addmonotonicallyIncreasingId() column to a streaming table and I see the following error

Failed to start stream [table_name] in either append mode or complete mode.
Append mode error: Expression(s): monotonically_increasing_id() is not supported with streaming DataFrames/Datasets;

Streaming from S3 buckets to databricks. 

Can someone please help.

4 REPLIES 4

Kaniz_Fatma
Community Manager
Community Manager

Hi @Rishitha , If you've encountered an error indicating that the monotonicallyIncreasingId() function is not supported for streaming DataFrames or datasets, you can use the row_number() window function to generate a monotonically increasing id column for your streaming tables.

Here's how you can do it:

from pyspark.sql import functions as F
from pyspark.sql import Window

# Assuming `streaming_df` is your streaming DataFrame
# Here's how to add a monotonically increasing id column
streaming_df_with_id = streaming_df.withColumn("id", F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))

# Start your streaming query...
console_output_stream = streaming_df_with_id.writeStream.trigger(processingTime="10 seconds").format("console").start()

In this example, the row_number() window function is used to add a monotonically increasing id column to your streaming_df. The orderBy() method utilizes monotonically_increasing_id() as the ordering criteria, creating a unique and increasing id for each row in your streaming DataFrame.

After adding the id column, you can proceed to start your streaming query as usual, and the row_number() function will generate a monotonically increasing id for each row in your streaming DataFrame.

jose_gonzalez
Moderator
Moderator

Hi @Rishitha,

Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

MuthuLakshmi
New Contributor III
New Contributor III

The error "Failed to start stream [table_name] in either append mode or complete mode. Append mode error: Expression(s): monotonically_increasing_id() is not supported with streaming DataFrames/Datasets" occurs when trying to start the stream in append mode and you are trying to perform an operation that is not supported in Structured Streaming. You can't do row_number in streaming.

You can resolve this issue by applying SQL window functions.

If you are performing an aggregation, you must apply a watermark to the DataFrame if you want to use append mode. The aggregation must have an event-time column, or a window on the event-time column.

Niro
New Contributor II

Is aggregations with row_number() combined with a SQL window function and a watermark still supported in Databricks 14.3?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group