Databricks Community

MetaMaestro · ‎09-01-2023

I've been working on obtaining DDL at the schema level in Hive Metastore within GCP-hosted Databricks. I've implemented a Python code that generates SQL files in the dbfs/temp directory. However, when running the code, I'm encountering a "file path not found" error. Strangely, the code functions correctly with AWS-hosted Databricks accounts. Could anyone shed light on why this discrepancy might exist for GCP?

Additionally, I attempted to retrieve the results using the user interface, but it didn't provide all the DDL due to UI limitations.

Are there any potential workarounds or suggestions for addressing this issue?

Here is the Python code

# set your catalog name

catalog = "your_catalog_name"

# there should be a comma-separated list of schemas or single schema name

schemas = "schema_name".split(",")

spark.catalog.setCurrentCatalog(catalog)

# prepare file

folder_for_script = "/tmp/"

# creating a folder if it does not exist

dbutils.fs.mkdirs(folder_for_script)

file_path = "{}{}_ddl.sql".format(folder_for_script, catalog)

# creating and opening a file for writing

f = open("/dbfs"+file_path, "w")

f.truncate()        

for schema in schemas:   

   allTables = spark.catalog.listTables(schema)

   f.write("-- {}".format(schema))

   f.write("\n")

   for t in allTables:

       # skip temporary tables

       if t.isTemporary != True:

           try:   

               ddl = spark.sql("SHOW CREATE TABLE {}.{};".format(schema, t.name))

               f.write(ddl.first()[0]+";")

               f.write("\n")

           except Exception as error:

               f.write("\n --- START ERROR --- \n /*\n")

               f.write("name: {}.{},\ntableType: {} \n".format(t.namespace, t.name, t.tableType))

               f.write("Unknown exception: {}".format(error))

               f.write("*/\n --- END ERROR --- \n")

f.close()

# console output

script = spark.sparkContext.textFile(file_path)

file_data = script.collect()

for line in file_data:

   print(line)

I attempted to retrieve the DDL through the user interface (UI), but it didn't fetch the complete DDL. Many tables were missing due to limitations within the UI. Are there any potential workarounds for this issue?
Thanks