To import Databricks metadata information from Unity Catalog, including lineage, into GCP Dataplex, you can use the Unity Catalog APIs to extract the necessary metadata and lineage information and then push this data into Dataplex using its APIs.
Here are the steps you can follow:
-
Extract Metadata and Lineage from Unity Catalog:
- Use the Unity Catalog REST APIs to retrieve metadata and lineage information. You can find the API documentation and endpoints for Unity Catalog in the Databricks documentation. The relevant APIs include those for listing catalogs, schemas, tables, and retrieving lineage information.
-
Transform the Data:
- Transform the extracted metadata and lineage information into the format required by GCP Dataplex. This may involve converting the data into JSON or another suitable format that Dataplex can ingest.
-
Push Data to GCP Dataplex:
- Use the Dataplex APIs to push the transformed metadata and lineage information into Dataplex. The Dataplex documentation provides details on how to use their APIs for metadata management.
Currently, there is no direct integration between Unity Catalog and GCP Dataplex similar to the integration between Purview and Unity Catalog. Therefore, you will need to use the APIs provided by both platforms to achieve this integration manually.