cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

lnsnarayanan
New Contributor II

I am using Databricks community edition for learning purposes.

I created some Hive-managed tables through spark sql as well as with df.saveAsTable options.

But when I connect to a new cluser,

"Show databases"

only returns the default database.

The database and tables I created with the previous cluster are not shown.

However when I run

spark.conf.get("spark.sql.warehouse.dir")

it returns

'/user/hive/warehouse'

and When I check the dbfs directory

%fs ls dbfs:/user/hive/warehouse

I see the databases and tables I created in other sessions. Am I missing anything or sparkSQL doesn't read the hive managed tables created in other clusters in commnunity edition?

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Esteemed Contributor III
Esteemed Contributor III

This is a limitation in CE.

View solution in original post

7 REPLIES 7

-werners-
Esteemed Contributor III

can you see the database and tables in the web UI, "Data" in the menu on the left?

I cannot see databases and tables when no cluster is attached and when I attach a new cluster, I see only the default database.

But I can see previously created databases and tables using

%fs ls /user/hive/warehouse command.

It is not showing up in the data menu or using sparkSQL.

-werners-
Esteemed Contributor III

Yes, but you mentioned you did thus by using the "show databases" command in a notebook.

I ask you to check using the Databricks Web UI.

It could be that Community Edition does not have this functionality though.

dududu
New Contributor II

I have the same problem, so it's the feather of CE ?

Srihasa_Akepati
New Contributor III
New Contributor III

Hello,

Yes, the databases created from one cluster cannot be accessed from another in CE. Even a restart of the same cluster would not show the databases/tables created before the restart.

Prabakar
Esteemed Contributor III
Esteemed Contributor III

This is a limitation in CE.

dez
New Contributor II

This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.

If there's any dependent objects from previous labs I have to rerun the Notebooks and recreate them, but before I do it I have to drop all files and directories (tables) I had previously created.

Here's a tip if anyone needs:
1. List the Database directory (usually it will be dbfs:/user/hive/datawarehouse/)

%sql
DESCRIBE DATABASE default
2. Get the files Location from the previous output and list the files
f = dbutils.fs.ls('dbfs:/user/hive/warehouse')
display(f)
3. Remove the files and directories so that you'll be able to rerun your Notebook to recreate the dependent tables to continue your labs.
%fs rm -r dbfs:/user/hive/warehouse/<replace_with_a_file_or_directory>/
 
Now you're good to go on recreating the tables and views.
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.