cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog Lineage Not Working on GCP

4kb_nick
New Contributor III

Hello,

We have set up a lakehouse in Databricks for one of our clients. One of the features our client would like to use is the Unity Catalog data lineage view. This is a handy feature that we have used with other clients (in both AWS and Azure) without issue.

We noticed that the lineage data is not being populated at all for the UC tables in our GCP workspaces. Even just running through the UC Sample notebook, we do not see any Lineage data being populated. Looking at the logs, we saw errors like the below that made us think perhaps the issue was with the log4j config:

2024-05-07 13:08:02,895 Thread-168 WARN RollingFileAppender 'com.databricks.LineageLogging.appender': The bufferSize is set to 128000 but bufferedIO is not true

After modifying the log4j properties specified in the error message, we no longer see the log messages. However, the lineage service still does not appear to be working. Our GCP workspaces are allowed outbound access to the internet via our NAT gateways, and are not passing through any in-line firewalls. 

Has anyone run into this issue in GCP, and does anyone know how to resolve it if so?

---

As an aside, updating the log4j properties was not as straightforward as mentioned here:
https://kb.databricks.com/clusters/overwrite-log4j-logs

The file specified in the above KB article does not exist on the clusters we tested in GCP (single-node, 13.3.x-scala2.12). The log4j file we had to modify is located at: /databricks/spark/dbconf/log4j/driver/log4j2.xml

 

3 REPLIES 3

Yeshwanth
Databricks Employee
Databricks Employee

@4kb_nick 

Could you please check the requirements for the lineage feature and its limitations here: https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html?_ga=2.115379718.1...

Kind regards,

Yesh

4kb_nick
New Contributor III

Sure - I've checked the requirements:

  • The workspace must have Unity Catalog enabledIt's enabled.

  • Tables must be registered in a Unity Catalog metastore. They are. I'm just using the sample Unity Catalog lineage notebook located here: https://notebooks.databricks.com/demos/uc-03-data-lineage/index.html

  • Queries must use the Spark DataFrame (for example, Spark SQL functions that return a DataFrame) or Databricks SQL interfaces. For examples of Databricks SQL and PySpark queries, see ExamplesThey are - I'm using the sample notebook, which is interfacing with UC via Databricks SQL.

  • To view the lineage of a table or view, users must have at least the BROWSE privilege on the table’s or view’s parent catalog. I am the owner of the catalog and have ALL PRIVILEGES on it as well.

  • To view lineage information for notebooks, workflows, or dashboards, users must have permissions on these objects as defined by the access control settings in the workspace. See Lineage permissionsI have permissions to all of the objects in the loop - the notebook, as well as the catalog.

  • To view lineage for a Unity Catalog-enabled pipeline, you must have CAN_VIEW permissions on the pipeline. I'm not using a pipeline in my testing.

4kb_nick
New Contributor III

Hello,

It's been a few months since this exchange. The feature limitation is not documented anywhere - documents imply that this should be working in GCP:
https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.html

Is this feature just off the table for us? Is it not working as intended in Google Cloud? Is it not available in the northamerica-northeast1 region specifically?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group