<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Managed table overwrites existing location for delta but not for oth in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59720#M31499</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99104"&gt;@Red_blue_green&lt;/a&gt;&amp;nbsp;, I think I get it. The 'delta' mode does not need to delete files to overwrite data. It can keep all files by maintaining history, even the existing ones. Am I right?&lt;/P&gt;</description>
    <pubDate>Thu, 08 Feb 2024 18:47:32 GMT</pubDate>
    <dc:creator>Dhruv-22</dc:creator>
    <dc:date>2024-02-08T18:47:32Z</dc:date>
    <item>
      <title>Managed table overwrites existing location for delta but not for oth</title>
      <link>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59668#M31469</link>
      <description>&lt;P&gt;I am working on Azure Databricks, with Databricks Runtime version being -&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#993300"&gt;14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)&lt;/FONT&gt;. I am facing the following issue.&lt;/P&gt;&lt;P&gt;Suppose I have a view named v1 and a database f1_processed created from the following command.&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;CREATE&lt;/SPAN&gt; DATABASE IF &lt;SPAN class=""&gt;NOT&lt;/SPAN&gt; &lt;SPAN class=""&gt;EXISTS&lt;/SPAN&gt; f1_processed
LOCATION "abfss://processed@formula1dl679student.dfs.core.windows.net/"&lt;/PRE&gt;&lt;P&gt;This is creating a database in the container named&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#993300"&gt;processed&lt;/FONT&gt;. Suppose I already have some folder named&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#993300"&gt;circuits&lt;/FONT&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in that container.&lt;/P&gt;&lt;P&gt;If I run the following command to create a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;managed table&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in parquet format from a dataframe in that location using the command below.&lt;/P&gt;&lt;PRE&gt;circuits_final_df.write.mode(&lt;SPAN class=""&gt;"overwrite"&lt;/SPAN&gt;).&lt;SPAN class=""&gt;format&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"parquet"&lt;/SPAN&gt;).saveAsTable(&lt;SPAN class=""&gt;"f1_processed.circuits"&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;P&gt;It gives an error as follows&lt;/P&gt;&lt;PRE&gt;SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name the managed table &lt;SPAN class=""&gt;as&lt;/SPAN&gt; 
`spark_catalog`.`f1_processed`.`circuits`, &lt;SPAN class=""&gt;as&lt;/SPAN&gt; its associated location 
&lt;SPAN class=""&gt;'abfss://processed@formula1dl679student.dfs.core.windows.net/circuits'&lt;/SPAN&gt; already exists. 
Please pick a different table name, &lt;SPAN class=""&gt;or&lt;/SPAN&gt; remove the existing location first. SQLSTATE: &lt;SPAN class=""&gt;42710&lt;/SPAN&gt;
&lt;/PRE&gt;&lt;P&gt;However, if I try the same thing in delta format, it runs fine. So the following code runs fine.&lt;/P&gt;&lt;PRE&gt;circuits_final_df.write.mode(&lt;SPAN class=""&gt;"overwrite"&lt;/SPAN&gt;).&lt;SPAN class=""&gt;format&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"delta"&lt;/SPAN&gt;).saveAsTable(&lt;SPAN class=""&gt;"f1_processed.circuits"&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;P&gt;Also, while creating this delta table, it doesn't remove any files from the folder. It just adds the new files.&lt;/P&gt;&lt;P&gt;Since the result mixes the existing data and new data, it seems it is a bug and it should not happen. Any help is appreciated.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 09:35:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59668#M31469</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2024-02-08T09:35:38Z</dc:date>
    </item>
    <item>
      <title>Re: Managed table overwrites existing location for delta but not for oth</title>
      <link>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59671#M31472</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;this is how the delta format work. With overwrite you are not deleting the files in the folder or replacing them. Delta is creating a new file with the overwritten schema and data. This way you are also able to return to former versions of the delta table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I believe you are having a table in delta formath in the path, its not possible to overwrite it as parquet format. You need to delete the folder in the abfss location. Then you can create a parquet file there again.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 10:26:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59671#M31472</guid>
      <dc:creator>Red_blue_green</dc:creator>
      <dc:date>2024-02-08T10:26:50Z</dc:date>
    </item>
    <item>
      <title>Re: Managed table overwrites existing location for delta but not for oth</title>
      <link>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59680#M31476</link>
      <description>&lt;P&gt;Hey, my main doubt was that a managed table cannot be created at a location that already exists. So the error should come for all formats (delta, parquet ...). The code is giving an error for parquet but not for delta. Why is this behaviour? Why is delta able to create a managed table at a non-empty location?&lt;/P&gt;&lt;P&gt;Also, I just kept a random csv file in a folder named circuits to check for the error.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 11:33:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59680#M31476</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2024-02-08T11:33:05Z</dc:date>
    </item>
    <item>
      <title>Re: Managed table overwrites existing location for delta but not for oth</title>
      <link>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59720#M31499</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99104"&gt;@Red_blue_green&lt;/a&gt;&amp;nbsp;, I think I get it. The 'delta' mode does not need to delete files to overwrite data. It can keep all files by maintaining history, even the existing ones. Am I right?&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 18:47:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59720#M31499</guid>
      <dc:creator>Dhruv-22</dc:creator>
      <dc:date>2024-02-08T18:47:32Z</dc:date>
    </item>
    <item>
      <title>Re: Managed table overwrites existing location for delta but not for oth</title>
      <link>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59906#M31529</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99515"&gt;@Dhruv-22&lt;/a&gt;&amp;nbsp; , yes, this is how I understand your problem. You have a folder perhaps with delta files and try to overwrite it with parquet files? I'm not familiar with parquet but I believe you need to delete parquet files each time before you create the table in the same location. With delta like you said, its not neccessary anymore. Delta keeps each file and when you do a action like overwrite, merge or just add data to the table. Delta creates a new file in these cases in order to maintain history. Then you can use the time travel feature to move back to a prior version of the table.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2024 09:15:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/managed-table-overwrites-existing-location-for-delta-but-not-for/m-p/59906#M31529</guid>
      <dc:creator>Red_blue_green</dc:creator>
      <dc:date>2024-02-12T09:15:51Z</dc:date>
    </item>
  </channel>
</rss>

