<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the best way to take care of Drop and Rename a column in Schema evaluation. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20632#M13944</link>
    <description>&lt;P&gt;@Yogita Chavan​&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks for response. I am aware I can fetch history using timestamp and version but I am asking incase i am overwrtiting data after droping or typechanges like in below code:&lt;/P&gt;&lt;P&gt;(spark.read.table(...)&lt;/P&gt;&lt;P&gt;  .withColumn("birthDate", col("birthDate").cast("date"))&lt;/P&gt;&lt;P&gt;  .write&lt;/P&gt;&lt;P&gt;  .mode("overwrite")&lt;/P&gt;&lt;P&gt;  .option("overwriteSchema", "true")&lt;/P&gt;&lt;P&gt;  .saveAsTable(...)&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But my question could it be automatically detected and done in below three case&lt;/P&gt;&lt;P&gt;-Null values&lt;/P&gt;&lt;P&gt;-Type changes&lt;/P&gt;&lt;P&gt;-Drop column&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 26 Nov 2022 11:28:15 GMT</pubDate>
    <dc:creator>mickniz</dc:creator>
    <dc:date>2022-11-26T11:28:15Z</dc:date>
    <item>
      <title>What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20627#M13939</link>
      <description>&lt;P&gt;I would need some suggestion from DataBricks Folks. As per documentation in Schema Evaluation for Drop and Rename Data is overwritten. Does it means we loose data (because I read data is not deleted but kind of staged). Is it possible to query old data using history and restore.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Nov 2022 07:30:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20627#M13939</guid>
      <dc:creator>mickniz</dc:creator>
      <dc:date>2022-11-25T07:30:37Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20628#M13940</link>
      <description>&lt;P&gt;if you want to rename columns from your data then you can use function withColumn() and withColumnRenamed().&lt;/P&gt;&lt;P&gt;&amp;nbsp;Is it possible to query old data using history and restore--&amp;gt; yes we can query old data using Delta’s time travel capabilities.&amp;nbsp;If you write into a Delta table or directory, every operation is automatically versioned. You can access the different versions of the data two different ways:&lt;/P&gt;&lt;P&gt;1)&lt;B&gt; Using a timestamp&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Scala syntax:&lt;/P&gt;&lt;P&gt;You can provide the timestamp or date string as an option to DataFrame reader:&lt;/P&gt;&lt;P&gt;val df = spark.read&lt;/P&gt;&lt;P&gt;  .format("delta")&lt;/P&gt;&lt;P&gt;  .option("timestampAsOf", "2019-01-01")&lt;/P&gt;&lt;P&gt;  .load("/path/to/my/table")&lt;/P&gt;&lt;P&gt;In Python:&lt;/P&gt;&lt;P&gt;df = spark.read \&lt;/P&gt;&lt;P&gt;  .format("delta") \&lt;/P&gt;&lt;P&gt;  .option("timestampAsOf", "2019-01-01") \&lt;/P&gt;&lt;P&gt;  .load("/path/to/my/table")&lt;/P&gt;&lt;P&gt;SQL syntax :&lt;/P&gt;&lt;P&gt;SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01"&lt;/P&gt;&lt;P&gt;SELECT count(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1)&lt;/P&gt;&lt;P&gt;SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01 01:30:00.000"&lt;/P&gt;&lt;P&gt;2)&lt;B&gt; Using a version number&lt;/B&gt;&lt;/P&gt;&lt;P&gt;In Delta, every write has a version number, and you can use the version number to travel back in time as well.&lt;/P&gt;&lt;P&gt;Scala syntax:&lt;/P&gt;&lt;P&gt;val df = spark.read&lt;/P&gt;&lt;P&gt;  .format("delta")&lt;/P&gt;&lt;P&gt;  .option("versionAsOf", "5238")&lt;/P&gt;&lt;P&gt;  .load("/path/to/my/table")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;val df = spark.read&lt;/P&gt;&lt;P&gt;  .format("delta")&lt;/P&gt;&lt;P&gt;  .load("/path/to/my/table@v5238")&lt;/P&gt;&lt;P&gt;Python syntax:&lt;/P&gt;&lt;P&gt;df = spark.read \&lt;/P&gt;&lt;P&gt;  .format("delta") \&lt;/P&gt;&lt;P&gt;  .option("versionAsOf", "5238") \&lt;/P&gt;&lt;P&gt;  .load("/path/to/my/table")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df = spark.read \&lt;/P&gt;&lt;P&gt;  .format("delta") \&lt;/P&gt;&lt;P&gt;  .load("/path/to/my/table@v5238")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SQL syntax:&lt;/P&gt;&lt;P&gt;SELECT count(*) FROM my_table VERSION AS OF 5238&lt;/P&gt;&lt;P&gt;SELECT count(*) FROM my_table@v5238&lt;/P&gt;&lt;P&gt;SELECT count(*) FROM delta.`/path/to/my/table@v5238`&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Nov 2022 09:34:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20628#M13940</guid>
      <dc:creator>yogu</dc:creator>
      <dc:date>2022-11-25T09:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20629#M13941</link>
      <description>&lt;P&gt;Can you please elaborate on what you are trying to do especially with that drop and rename part ?&lt;/P&gt;&lt;P&gt;As for the querying old data using history and restore, you can make of delta time travel if you are storing that data in a delta format. Above answer already has the querying commands. &lt;/P&gt;&lt;P&gt;If you want the timestamp/version you need to restore to, you can simply run a describe history &amp;lt;deltatable&amp;gt; for all the details.&lt;/P&gt;&lt;P&gt;Cheers..&lt;/P&gt;</description>
      <pubDate>Fri, 25 Nov 2022 12:40:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20629#M13941</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-11-25T12:40:38Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20630#M13942</link>
      <description>&lt;P&gt;basically i am creating function that takes care of schema changes(drop,type change and null values). As per Delta table dccumentation these changes only work when while writing we select overwrite option. My cocern is if my overwrite my previous data will be lost. Is there a way in DeltaTables to backup old data before overwriting. How to take care of old data when overwriting with new schema&lt;/P&gt;</description>
      <pubDate>Sat, 26 Nov 2022 10:15:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20630#M13942</guid>
      <dc:creator>mickniz</dc:creator>
      <dc:date>2022-11-26T10:15:11Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20631#M13943</link>
      <description>&lt;P&gt;thanks for response. I know this we can do but my question was when we overwrite data with new schema ,Will this data be available&lt;/P&gt;</description>
      <pubDate>Sat, 26 Nov 2022 10:16:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20631#M13943</guid>
      <dc:creator>mickniz</dc:creator>
      <dc:date>2022-11-26T10:16:57Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20632#M13944</link>
      <description>&lt;P&gt;@Yogita Chavan​&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks for response. I am aware I can fetch history using timestamp and version but I am asking incase i am overwrtiting data after droping or typechanges like in below code:&lt;/P&gt;&lt;P&gt;(spark.read.table(...)&lt;/P&gt;&lt;P&gt;  .withColumn("birthDate", col("birthDate").cast("date"))&lt;/P&gt;&lt;P&gt;  .write&lt;/P&gt;&lt;P&gt;  .mode("overwrite")&lt;/P&gt;&lt;P&gt;  .option("overwriteSchema", "true")&lt;/P&gt;&lt;P&gt;  .saveAsTable(...)&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But my question could it be automatically detected and done in below three case&lt;/P&gt;&lt;P&gt;-Null values&lt;/P&gt;&lt;P&gt;-Type changes&lt;/P&gt;&lt;P&gt;-Drop column&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Nov 2022 11:28:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20632#M13944</guid>
      <dc:creator>mickniz</dc:creator>
      <dc:date>2022-11-26T11:28:15Z</dc:date>
    </item>
    <item>
      <title>Re: What is the best way to take care of Drop and Rename a column in Schema evaluation.</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20633#M13945</link>
      <description>&lt;P&gt;Overwritte ​option will overwritte your data. If you want to change column name then you can first alter the delta table as per your need then you can append new data as well. So both problems you can resolve &lt;/P&gt;</description>
      <pubDate>Tue, 29 Nov 2022 19:31:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-best-way-to-take-care-of-drop-and-rename-a-column-in/m-p/20633#M13945</guid>
      <dc:creator>SS2</dc:creator>
      <dc:date>2022-11-29T19:31:31Z</dc:date>
    </item>
  </channel>
</rss>

