cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

SchemaEvolutionMode exception in Databricks 14.2

Dikshant
New Contributor

I am unable to display the below stream after reading it.

df= spark.readStream.format("cloudFiles")\
.option("cloudFiles.format", "csv")\
.option("header", "true")\
.option("delimiter", "\t")\
.option("inferSchema", "true")\
.option("cloudFiles.connectionString", connection_string)\
.option("cloudFiles.resourceGroup", resource_group)\
.option("cloudFiles.subscriptionId", subscription_id)\
.option("cloudFiles.tenantId", tenant_id)\
.option("cloudFiles.clientId", client_id)\
.option("cloudFiles.clientSecret", client_secret)\
.option("cloudFiles.schemaLocation", schema_folder)\
.option("cloudFiles.schemaEvolutionMode", "addNewColumns")
.load(input_folder_path)

df.display()

Below is the exception that I am getting.

com.databricks.sql.cloudfiles.errors.CloudFilesSchemaEvolutionException: Stateful streaming queries do not support schema evolution. Please set the option "cloudFiles.schemaEvolutionMode" to "rescue" or "none".<p>

I am not doing any aggregation while reading the stream so stateful stream should have not been created. Is there anything I am missing here?

Secondly, is there a command that I can fire to know whether a stream is stateful or stateless?

I am running the above command in databricks runtime 14.2.</p></em></div></div></div>

 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Dikshant

  • Unfortunately, stateful streaming queries do not support schema evolution. This means that once a query starts with a particular schema, you cannot change it during query restarts.
  • To resolve this issue, you can set the cloudFiles.schemaEvolutionMode option to either "rescue" or "none":
    • "rescue": Allows the query to continue running even if there are schema changes. However, be cautious as this may lead to unexpected behavior.
    • "none": Prevents schema evolution and ensures that the schema remains consistent throughout the query execution.
    • To check whether a stream is stateful or stateless, you can examine the nature of your query:
      • Stateful Streaming:
        • Requires incremental updates to intermediate state information.
        • Typically used for more complex operations that maintain state across batches.
      • Stateless Streaming:
        • Only tracks information about which rows have been processed from source to sink.
        • Suited for simpler operations without the need for the intermediate state.
    • When working with stateful Structured Streaming queries, consider the following recommendations:
      • Use compute-optimized instances as workers.
      • Set the number of shuffle partitions to 1-2 times the number of cores in the cluster.
      • Set spark.sql.streaming.noDataMicroBatches.enabled to false to prevent processing empty micro-batches.
      • Consider using RocksDB with changelog checkpointing for managing state.
      • Note that changing the state management scheme between query restarts requires starting the query from scratch with a new checkpoint location.
    • In summary, review your schema evolution settings and verify whether your query truly needs stateful processing. If not, consider switching to stateless streaming. If you have further questions or need additional assistance, feel free to ask! 😊

      References:

      1. Optimize stateful Structured Streaming queries - Databricks
      2. Production considerations for Structured Streaming - Databricks
      3. Stateful vs Stateless Streaming in Spark Streaming
      4. Multiple Stateful Operators in Structured Streaming - Databricks
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.