Databricks Community

mickniz · ‎11-24-2022

I would need some suggestion from DataBricks Folks. As per documentation in Schema Evaluation for Drop and Rename Data is overwritten. Does it means we loose data (because I read data is not deleted but kind of staged). Is it possible to query old data using history and restore.

yogu · ‎11-25-2022

if you want to rename columns from your data then you can use function withColumn() and withColumnRenamed().

Is it possible to query old data using history and restore--> yes we can query old data using Delta’s time travel capabilities. If you write into a Delta table or directory, every operation is automatically versioned. You can access the different versions of the data two different ways:

1) Using a timestamp

Scala syntax:

You can provide the timestamp or date string as an option to DataFrame reader:

val df = spark.read

.format("delta")

.option("timestampAsOf", "2019-01-01")

.load("/path/to/my/table")

In Python:

df = spark.read \

.format("delta") \

.option("timestampAsOf", "2019-01-01") \

.load("/path/to/my/table")

SQL syntax :

SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01"

SELECT count(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1)

SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01 01:30:00.000"

2) Using a version number

In Delta, every write has a version number, and you can use the version number to travel back in time as well.

Scala syntax:

val df = spark.read

.format("delta")

.option("versionAsOf", "5238")

.load("/path/to/my/table")

val df = spark.read

.format("delta")

.load("/path/to/my/table@v5238")

Python syntax:

df = spark.read \

.format("delta") \

.option("versionAsOf", "5238") \

.load("/path/to/my/table")

df = spark.read \

.format("delta") \

.load("/path/to/my/table@v5238")

SQL syntax:

SELECT count(*) FROM my_table VERSION AS OF 5238

SELECT count(*) FROM my_table@v5238

SELECT count(*) FROM delta.`/path/to/my/table@v5238`

mickniz · ‎11-26-2022

thanks for response. I know this we can do but my question was when we overwrite data with new schema ,Will this data be available

mickniz · ‎11-26-2022

@Yogita Chavan

thanks for response. I am aware I can fetch history using timestamp and version but I am asking incase i am overwrtiting data after droping or typechanges like in below code:

(spark.read.table(...)

.withColumn("birthDate", col("birthDate").cast("date"))

.write

.mode("overwrite")

.option("overwriteSchema", "true")

.saveAsTable(...)

)

But my question could it be automatically detected and done in below three case

-Null values

-Type changes

-Drop column

UmaMahesh1 · ‎11-25-2022

Can you please elaborate on what you are trying to do especially with that drop and rename part ?

As for the querying old data using history and restore, you can make of delta time travel if you are storing that data in a delta format. Above answer already has the querying commands.

If you want the timestamp/version you need to restore to, you can simply run a describe history <deltatable> for all the details.

Cheers..

Uma Mahesh D

mickniz · ‎11-26-2022

basically i am creating function that takes care of schema changes(drop,type change and null values). As per Delta table dccumentation these changes only work when while writing we select overwrite option. My cocern is if my overwrite my previous data will be lost. Is there a way in DeltaTables to backup old data before overwriting. How to take care of old data when overwriting with new schema

SS2 · ‎11-29-2022

Overwritte option will overwritte your data. If you want to change column name then you can first alter the delta table as per your need then you can append new data as well. So both problems you can resolve

Databricks Community

What is the best way to take care of Drop and Rename a column in Schema evaluation.

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs