cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta live tables straming

Rishitha
New Contributor III

I'm trying to addmonotonicallyIncreasingId() column to a streaming table and I see the following error

Failed to start stream [table_name] in either append mode or complete mode.
Append mode error: Expression(s): monotonically_increasing_id() is not supported with streaming DataFrames/Datasets;

Streaming from S3 buckets to databricks. 

Can someone please help.

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @Rishitha , If you've encountered an error indicating that the monotonicallyIncreasingId() function is not supported for streaming DataFrames or datasets, you can use the row_number() window function to generate a monotonically increasing id column for your streaming tables.

Here's how you can do it:

from pyspark.sql import functions as F
from pyspark.sql import Window

# Assuming `streaming_df` is your streaming DataFrame
# Here's how to add a monotonically increasing id column
streaming_df_with_id = streaming_df.withColumn("id", F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))

# Start your streaming query...
console_output_stream = streaming_df_with_id.writeStream.trigger(processingTime="10 seconds").format("console").start()

In this example, the row_number() window function is used to add a monotonically increasing id column to your streaming_df. The orderBy() method utilizes monotonically_increasing_id() as the ordering criteria, creating a unique and increasing id for each row in your streaming DataFrame.

After adding the id column, you can proceed to start your streaming query as usual, and the row_number() function will generate a monotonically increasing id for each row in your streaming DataFrame.

jose_gonzalez
Moderator
Moderator

Hi @Rishitha,

Just a friendly follow-up. Have you had a chance to review my colleague's response to your inquiry? Did it prove helpful, or are you still in need of assistance? Your response would be greatly appreciated.

MuthuLakshmi
New Contributor III
New Contributor III

The error "Failed to start stream [table_name] in either append mode or complete mode. Append mode error: Expression(s): monotonically_increasing_id() is not supported with streaming DataFrames/Datasets" occurs when trying to start the stream in append mode and you are trying to perform an operation that is not supported in Structured Streaming. You can't do row_number in streaming.

You can resolve this issue by applying SQL window functions.

If you are performing an aggregation, you must apply a watermark to the DataFrame if you want to use append mode. The aggregation must have an event-time column, or a window on the event-time column.

Niro
New Contributor II

Is aggregations with row_number() combined with a SQL window function and a watermark still supported in Databricks 14.3?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.