cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta adds a new partition making the old partition unreadable

User16826994223
Honored Contributor III

  In Notebook, My code read and write the data to delta , My delta is partitioned by calendar_date. After the initial load i am able to read the delta file and look the data just fine.But after the second load for data for 6 month , the previous partitons are not loading normally using delta format.Reading my source delta file like this throws me error saying file dosen't exist.

    spark.read.format("delta").load("/mnt/kgaurav/table /calendar_date=2018-10-04/")

However reading like below just works fine any idea what could be wrong

spark.conf.set("spark.databricks.delta.formatCheck.enabled", "false")
 
    spark.read.format("parquet").load("/mnt/kgaurav/table/calendar_date=2018-

1 ACCEPTED SOLUTION

Accepted Solutions

User16826994223
Honored Contributor III

I think you are writing the data in override mode. what happens in delta is it doesn't delete the data for certain days even it is written by overwrite mode for versioning , and you will be able to query only most recent data,

But in format parquet if you are reading you are using deleted file also that is why you are able to get the data.

v = spark.sql(f"DESCRIBE HISTORY delta.`{path}` limit 2")
version = v.take(2)[1][0]
df = spark.read.format("delta").option("versionAsOf", version).load(path)
)

View solution in original post

1 REPLY 1

User16826994223
Honored Contributor III

I think you are writing the data in override mode. what happens in delta is it doesn't delete the data for certain days even it is written by overwrite mode for versioning , and you will be able to query only most recent data,

But in format parquet if you are reading you are using deleted file also that is why you are able to get the data.

v = spark.sql(f"DESCRIBE HISTORY delta.`{path}` limit 2")
version = v.take(2)[1][0]
df = spark.read.format("delta").option("versionAsOf", version).load(path)
)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group