cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

GCP hosted Databricks - DBFS temp files - Not Found

MetaMaestro
New Contributor III

I've been working on obtaining DDL at the schema level in Hive Metastore within GCP-hosted Databricks. I've implemented a Python code that generates SQL files in the dbfs/temp directory. However, when running the code, I'm encountering a "file path not found" error. Strangely, the code functions correctly with AWS-hosted Databricks accounts. Could anyone shed light on why this discrepancy might exist for GCP?

Additionally, I attempted to retrieve the results using the user interface, but it didn't provide all the DDL due to UI limitations.

Are there any potential workarounds or suggestions for addressing this issue?

Here is the Python code

 

# set your catalog name

catalog = "your_catalog_name"

# there should be a comma-separated list of schemas or single schema name

schemas = "schema_name".split(",")

spark.catalog.setCurrentCatalog(catalog)

# prepare file

folder_for_script = "/tmp/"

# creating a folder if it does not exist

dbutils.fs.mkdirs(folder_for_script)

file_path = "{}{}_ddl.sql".format(folder_for_script, catalog)

# creating and opening a file for writing

f = open("/dbfs"+file_path, "w")

f.truncate()        

for schema in schemas:   

   allTables = spark.catalog.listTables(schema)

   f.write("-- {}".format(schema))

   f.write("\n")

   for t in allTables:

       # skip temporary tables

       if t.isTemporary != True:

           try:   

               ddl = spark.sql("SHOW CREATE TABLE {}.{};".format(schema, t.name))

               f.write(ddl.first()[0]+";")

               f.write("\n")

           except Exception as error:

               f.write("\n --- START ERROR --- \n /*\n")

               f.write("name: {}.{},\ntableType: {} \n".format(t.namespace, t.name, t.tableType))

               f.write("Unknown exception: {}".format(error))

               f.write("*/\n --- END ERROR --- \n")

f.close()

# console output

script = spark.sparkContext.textFile(file_path)

file_data = script.collect()

for line in file_data:

   print(line)

I attempted to retrieve the DDL through the user interface (UI), but it didn't fetch the complete DDL. Many tables were missing due to limitations within the UI. Are there any potential workarounds for this issue?
Thanks

1 REPLY 1

Debayan
Databricks Employee
Databricks Employee

Hi, the error code snippet with the whole error may help to determine the issue, also, considering the above points may also work as a fix. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group