<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unity Catalog Lineage Not Working on GCP in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69854#M33918</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We have set up a lakehouse in Databricks for one of our clients. One of the features our client would like to use is the Unity Catalog data lineage view. This is a handy feature that we have used with other clients (in both AWS and Azure) without issue.&lt;/P&gt;&lt;P&gt;We noticed that the lineage data is not being populated at all for the UC tables in our GCP workspaces. Even just running through the UC Sample notebook, we do not see any Lineage data being populated. Looking at the logs, we saw errors like the below that made us think perhaps the issue was with the log4j config:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2024-05-07 13:08:02,895 Thread-168 WARN RollingFileAppender 'com.databricks.LineageLogging.appender': The bufferSize is set to 128000 but bufferedIO is not true&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;After modifying the log4j properties specified in the error message, we no longer see the log messages. However, the lineage service still does not appear to be working. Our GCP workspaces are allowed outbound access to the internet via our NAT gateways, and are not passing through any in-line firewalls.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone run into this issue in GCP, and does anyone know how to resolve it if so?&lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;As an aside, updating the log4j properties was not as straightforward as mentioned here:&lt;BR /&gt;&lt;A href="https://kb.databricks.com/clusters/overwrite-log4j-logs" target="_blank" rel="noopener"&gt;https://kb.databricks.com/clusters/overwrite-log4j-logs&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The file specified in the above KB article does not exist on the clusters we tested in GCP (single-node,&amp;nbsp;13.3.x-scala2.12). The log4j file we had to modify is located at:&amp;nbsp;&lt;STRONG&gt;/databricks/spark/dbconf/log4j/driver/log4j2.xml&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 19 May 2024 20:35:09 GMT</pubDate>
    <dc:creator>4kb_nick</dc:creator>
    <dc:date>2024-05-19T20:35:09Z</dc:date>
    <item>
      <title>Unity Catalog Lineage Not Working on GCP</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69854#M33918</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We have set up a lakehouse in Databricks for one of our clients. One of the features our client would like to use is the Unity Catalog data lineage view. This is a handy feature that we have used with other clients (in both AWS and Azure) without issue.&lt;/P&gt;&lt;P&gt;We noticed that the lineage data is not being populated at all for the UC tables in our GCP workspaces. Even just running through the UC Sample notebook, we do not see any Lineage data being populated. Looking at the logs, we saw errors like the below that made us think perhaps the issue was with the log4j config:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2024-05-07 13:08:02,895 Thread-168 WARN RollingFileAppender 'com.databricks.LineageLogging.appender': The bufferSize is set to 128000 but bufferedIO is not true&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;After modifying the log4j properties specified in the error message, we no longer see the log messages. However, the lineage service still does not appear to be working. Our GCP workspaces are allowed outbound access to the internet via our NAT gateways, and are not passing through any in-line firewalls.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone run into this issue in GCP, and does anyone know how to resolve it if so?&lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;As an aside, updating the log4j properties was not as straightforward as mentioned here:&lt;BR /&gt;&lt;A href="https://kb.databricks.com/clusters/overwrite-log4j-logs" target="_blank" rel="noopener"&gt;https://kb.databricks.com/clusters/overwrite-log4j-logs&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The file specified in the above KB article does not exist on the clusters we tested in GCP (single-node,&amp;nbsp;13.3.x-scala2.12). The log4j file we had to modify is located at:&amp;nbsp;&lt;STRONG&gt;/databricks/spark/dbconf/log4j/driver/log4j2.xml&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 19 May 2024 20:35:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69854#M33918</guid>
      <dc:creator>4kb_nick</dc:creator>
      <dc:date>2024-05-19T20:35:09Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Lineage Not Working on GCP</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69884#M33921</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36104"&gt;@4kb_nick&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please check the requirements for the lineage feature and its limitations here:&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html?_ga=2.115379718.1443584570.1716178804-126095900.1696334622#requirements" target="_blank"&gt;https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html?_ga=2.115379718.1443584570.1716178804-126095900.1696334622#requirements&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Kind regards,&lt;/P&gt;
&lt;P&gt;Yesh&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2024 04:23:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69884#M33921</guid>
      <dc:creator>Yeshwanth</dc:creator>
      <dc:date>2024-05-20T04:23:13Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Lineage Not Working on GCP</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69892#M33923</link>
      <description>&lt;P&gt;Sure - I've checked the requirements:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P&gt;The workspace must have&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.gcp.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html" target="_blank" rel="noopener"&gt;&lt;SPAN class=""&gt;Unity Catalog enabled&lt;/SPAN&gt;&lt;/A&gt;.&amp;nbsp;&lt;STRONG&gt;It's enabled.&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Tables must be registered in a Unity Catalog metastore.&lt;STRONG&gt; They are. I'm just using the sample Unity Catalog lineage notebook located here:&amp;nbsp;&lt;A href="https://notebooks.databricks.com/demos/uc-03-data-lineage/index.html" target="_self"&gt;https://notebooks.databricks.com/demos/uc-03-data-lineage/index.html&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Queries must use the Spark DataFrame (for example, Spark SQL functions that return a DataFrame) or Databricks SQL interfaces. For examples of Databricks SQL and PySpark queries, see&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html?_ga=2.115379718.1443584570.1716178804-126095900.1696334622#lineage-examples" target="_blank" rel="noopener"&gt;&lt;SPAN class=""&gt;Examples&lt;/SPAN&gt;&lt;/A&gt;.&amp;nbsp;&lt;STRONG&gt;They are - I'm using the sample notebook, which is interfacing with UC via Databricks SQL.&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;To view the lineage of a table or view, users must have at least the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;BROWSE&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;privilege on the table’s or view’s parent catalog.&amp;nbsp;&lt;STRONG&gt;I am the owner of the catalog and have ALL PRIVILEGES on it as well.&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;To view lineage information for notebooks, workflows, or dashboards, users must have permissions on these objects as defined by the access control settings in the workspace. See&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html?_ga=2.115379718.1443584570.1716178804-126095900.1696334622#permissions" target="_blank" rel="noopener"&gt;&lt;SPAN class=""&gt;Lineage permissions&lt;/SPAN&gt;&lt;/A&gt;.&amp;nbsp;&lt;STRONG&gt;I have permissions to all of the objects in the loop - the notebook, as well as the catalog.&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;To view lineage for a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.gcp.databricks.com/en/delta-live-tables/unity-catalog.html" target="_blank" rel="noopener"&gt;&lt;SPAN class=""&gt;Unity Catalog-enabled pipeline&lt;/SPAN&gt;&lt;/A&gt;, you must have&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;CAN_VIEW&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;permissions on the pipeline.&amp;nbsp;&lt;STRONG&gt;I'm not using a pipeline in my testing.&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 20 May 2024 05:27:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/69892#M33923</guid>
      <dc:creator>4kb_nick</dc:creator>
      <dc:date>2024-05-20T05:27:09Z</dc:date>
    </item>
    <item>
      <title>Re: Unity Catalog Lineage Not Working on GCP</title>
      <link>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/83159#M36857</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;It's been a few months since this exchange. The feature limitation is not documented anywhere - documents imply that this should be working in GCP:&lt;BR /&gt;&lt;A href="https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html" target="_blank"&gt;https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Is this feature just off the table for us? Is it not working as intended in Google Cloud? Is it not available in the northamerica-northeast1 region specifically?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Aug 2024 03:33:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unity-catalog-lineage-not-working-on-gcp/m-p/83159#M36857</guid>
      <dc:creator>4kb_nick</dc:creator>
      <dc:date>2024-08-16T03:33:02Z</dc:date>
    </item>
  </channel>
</rss>

