cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

lnsnarayanan
New Contributor II

I am using Databricks community edition for learning purposes.

I created some Hive-managed tables through spark sql as well as with df.saveAsTable options.

But when I connect to a new cluser,

"Show databases"

only returns the default database.

The database and tables I created with the previous cluster are not shown.

However when I run

spark.conf.get("spark.sql.warehouse.dir")

it returns

'/user/hive/warehouse'

and When I check the dbfs directory

%fs ls dbfs:/user/hive/warehouse

I see the databases and tables I created in other sessions. Am I missing anything or sparkSQL doesn't read the hive managed tables created in other clusters in commnunity edition?

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Esteemed Contributor III

This is a limitation in CE.

View solution in original post

8 REPLIES 8

-werners-
Esteemed Contributor III

can you see the database and tables in the web UI, "Data" in the menu on the left?

I cannot see databases and tables when no cluster is attached and when I attach a new cluster, I see only the default database.

But I can see previously created databases and tables using

%fs ls /user/hive/warehouse command.

It is not showing up in the data menu or using sparkSQL.

-werners-
Esteemed Contributor III

Yes, but you mentioned you did thus by using the "show databases" command in a notebook.

I ask you to check using the Databricks Web UI.

It could be that Community Edition does not have this functionality though.

dududu
New Contributor II

I have the same problem, so it's the feather of CE ?

Srihasa_Akepati
Contributor

Hello,

Yes, the databases created from one cluster cannot be accessed from another in CE. Even a restart of the same cluster would not show the databases/tables created before the restart.

Prabakar
Esteemed Contributor III

This is a limitation in CE.

dez
New Contributor II

This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.

If there's any dependent objects from previous labs I have to rerun the Notebooks and recreate them, but before I do it I have to drop all files and directories (tables) I had previously created.

Here's a tip if anyone needs:
1. List the Database directory (usually it will be dbfs:/user/hive/datawarehouse/)

%sql
DESCRIBE DATABASE default
2. Get the files Location from the previous output and list the files
f = dbutils.fs.ls('dbfs:/user/hive/warehouse')
display(f)
3. Remove the files and directories so that you'll be able to rerun your Notebook to recreate the dependent tables to continue your labs.
%fs rm -r dbfs:/user/hive/warehouse/<replace_with_a_file_or_directory>/
 
Now you're good to go on recreating the tables and views.

dhpaulino
New Contributor II

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:

db_name = "mydb"
from pathlib import Path
path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/"
tables_dirs = dbutils.fs.ls(path_db)
for d in tables_dirs:
  table_name = Path(d.path).name  
  spark.sql(f"""CREATE TABLE IF NOT EXISTS {table_name}
            LOCATION '{d.path}'
            """)

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group