@Pranathi Girish :
The error message you're seeing suggests that the schema for the Delta table you're trying to read hasn't been set yet. This can happen if you haven't written any data to the table yet, or if the schema hasn't been explicitly set using the
CREATE TABLE command.
To address the issue, you can try explicitly setting the schema for the Delta table using the CREATE TABLE command before running your readStream query. For example:
(spark.sql("CREATE TABLE delta_table_name (col1 STRING, col2 INT) USING DELTA LOCATION '/mnt/tmp/silver'")
.readStream
.format("delta")
.option("ignoreChanges","true")
.option("initialPosition","latest")
.load("/mnt/tmp/bronze"))
Regarding the question of waiting for data to be available in the Delta table, you can try setting the
failOnDataLoss option to false in your readStream query. This tells Spark to wait for new data to arrive in the stream instead of throwing an error if data is missing. For example:
(spark
.readStream
.format("delta")
.option("ignoreChanges", "true")
.option("initialPosition", "latest")
.option("failOnDataLoss", "false")
.load("/mnt/tmp/bronze")
)
With this option set, Spark will wait for new data to arrive in the stream and continue processing once it's available. However, note that this can result in slower performance if there are long gaps between data arriving in the stream.
Hope this helps you to find solution to your answer!