<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Use Unity External Location with full paths in delta_log in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64636#M1008</link>
    <description>&lt;P&gt;Besides what already has been mentioned, it is best to let the delta writer handle the location of _delta_log and the parquet files,&amp;nbsp; they belong to each other.&lt;/P&gt;</description>
    <pubDate>Tue, 26 Mar 2024 12:43:24 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2024-03-26T12:43:24Z</dc:date>
    <item>
      <title>Use Unity External Location with full paths in delta_log</title>
      <link>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64429#M997</link>
      <description>&lt;P&gt;I have an external delta table in unity catalog (let's call it mycatalog.myschema.mytable) that only consists of a `_delta_log` directory that I create semi-manually, with the corresponding JSON files that define it.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The JSON files point to parquet files that are not in the same directory as the `_delta_log`, but in a different one (can even be a different Azure storage account, I am in Azure Databricks)&lt;/P&gt;&lt;P&gt;As an example, the JSON could look like this:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;{
    "add": {
        "dataChange": true,
        "modificationTime": 1710850923000,
        "partitionValues": {},
        "path": "abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/somefile.snappy.parquet",
        "size": 12345,
        "stats": "{\"numRecords\":123}",
        "tags": {
            "INSERTION_TIME": "1710850923000000",
            "MAX_INSERTION_TIME": "1710850923000000",
            "MIN_INSERTION_TIME": "1710850923000000",
            "OPTIMIZE_TARGET_SIZE": "268435456"
        }
    }
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;When I try to read my delta table using&amp;nbsp;&lt;SPAN&gt;spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt; mycatalog.myschema.mytable&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;` I get the following error:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;RuntimeException: Couldn't initialize file system for path abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/somefile.snappy.parquet&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;which means Databricks is not trying to access that file using Unity external locations but the storage account key.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;The path is declared in a external location and I can access it normally with UC credentials using&amp;nbsp;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&lt;SPAN&gt;spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"abfss://mycontainer@mystorageaccount.dfs.core.windows.net/somepath/"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"delta"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Is there a way to use UC external locations with a delta table that uses absolute paths in the _delta_log? Due to security I don't want to add the storage account key to the cluster using &lt;FONT face="courier new,courier"&gt;spark.conf "fs.azure.account.key.mystorageaccount.dfs.core.windows.net&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Mar 2024 00:06:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64429#M997</guid>
      <dc:creator>migq2</dc:creator>
      <dc:date>2024-03-23T00:06:21Z</dc:date>
    </item>
    <item>
      <title>Re: Use Unity External Location with full paths in delta_log</title>
      <link>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64636#M1008</link>
      <description>&lt;P&gt;Besides what already has been mentioned, it is best to let the delta writer handle the location of _delta_log and the parquet files,&amp;nbsp; they belong to each other.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 12:43:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64636#M1008</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-03-26T12:43:24Z</dc:date>
    </item>
    <item>
      <title>Re: Use Unity External Location with full paths in delta_log</title>
      <link>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64640#M1009</link>
      <description>&lt;P&gt;Thanks for your reply Kaniz,&lt;/P&gt;&lt;P&gt;I understand your points, but I cannot use relative paths in my _delta_log because the files I need for my delta table are not all in the same path (they might not even be on the same storage account).&amp;nbsp;&lt;/P&gt;&lt;P&gt;Copying them is not an option either because I am doing this at scale for many tables and many files&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 12:50:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64640#M1009</guid>
      <dc:creator>migq2</dc:creator>
      <dc:date>2024-03-26T12:50:50Z</dc:date>
    </item>
    <item>
      <title>Re: Use Unity External Location with full paths in delta_log</title>
      <link>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64644#M1011</link>
      <description>&lt;P&gt;Thank you, however in my specific case the parquet files are not written by Spark or Databricks, but by another external tool.&lt;BR /&gt;&lt;BR /&gt;Also, some files are shared by multiple tables, or a table can have files in different storage accounts.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;This makes having them in the same location as a normal spark writer would create them not feasible&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 13:00:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64644#M1011</guid>
      <dc:creator>migq2</dc:creator>
      <dc:date>2024-03-26T13:00:40Z</dc:date>
    </item>
    <item>
      <title>Re: Use Unity External Location with full paths in delta_log</title>
      <link>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64646#M1012</link>
      <description>&lt;P&gt;I suggest you look at something else than UC for such cases.&amp;nbsp; I also wonder if delta lake is the right format.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 13:05:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/use-unity-external-location-with-full-paths-in-delta-log/m-p/64646#M1012</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-03-26T13:05:03Z</dc:date>
    </item>
  </channel>
</rss>

