Databricks Community

ramankr48 · ‎10-18-2022

let's say there is a database db in which 700 tables are there, and we need to find all the tables name in which column "project_id" is present.

just an example for ubderstanding the questions.

Anonymous · ‎10-18-2022

databaseName = "db"
desiredColumn = "project_id"
database = spark.sql(f"show tables in {databaseName} ").collect()
tablenames = []
for row in database:
  cols = spark.table(row.tableName).columns
  if desiredColumn in cols:
    tablenames.append(row.tableName)

Something close to that should work.

View solution in original post

Anonymous · ‎10-18-2022

databaseName = "db"
desiredColumn = "project_id"
database = spark.sql(f"show tables in {databaseName} ").collect()
tablenames = []
for row in database:
  cols = spark.table(row.tableName).columns
  if desiredColumn in cols:
    tablenames.append(row.tableName)

Something close to that should work.

ramankr48 · ‎10-18-2022

Thanks josephk it worked

Hubert-Dudek · ‎10-18-2022

Other possible solutions:

use new databricks search,
for those who migrated, use lineage in the unity catalog,
use lineage with Pureview (there is integration with hive metastore)

My blog: https://databrickster.medium.com/

Being-UK · ‎02-27-2023

Thanks @Joseph Kambourakis

but the code seems to rendering errors at my end:

com.immuta.spark.exceptions.NoSuchDataSourceException: A data source with the table name '`v_table_name`' does not exist, is not in the current project, or is not accessible by the current user.

I've changed the original table name to 'v_table_name', as it happenes the table name is the first table on the schema, so it seems the code is searching through the schema but fails on the first table for some reason. Thanks