Databricks Community

lnsnarayanan · ‎08-22-2021

I am using Databricks community edition for learning purposes.

I created some Hive-managed tables through spark sql as well as with df.saveAsTable options.

But when I connect to a new cluser,

"Show databases"

only returns the default database.

The database and tables I created with the previous cluster are not shown.

However when I run

spark.conf.get("spark.sql.warehouse.dir")

it returns

'/user/hive/warehouse'

and When I check the dbfs directory

%fs ls dbfs:/user/hive/warehouse

I see the databases and tables I created in other sessions. Am I missing anything or sparkSQL doesn't read the hive managed tables created in other clusters in commnunity edition?

Prabakar · ‎10-12-2021

This is a limitation in CE.

View solution in original post

-werners- · ‎08-23-2021

can you see the database and tables in the web UI, "Data" in the menu on the left?

lnsnarayanan · ‎08-23-2021

I cannot see databases and tables when no cluster is attached and when I attach a new cluster, I see only the default database.

But I can see previously created databases and tables using

%fs ls /user/hive/warehouse command.

It is not showing up in the data menu or using sparkSQL.

-werners- · ‎08-23-2021

Yes, but you mentioned you did thus by using the "show databases" command in a notebook.

I ask you to check using the Databricks Web UI.

It could be that Community Edition does not have this functionality though.

dududu · ‎09-24-2021

I have the same problem, so it's the feather of CE ?

Srihasa_Akepati · ‎10-07-2021

Hello,

Yes, the databases created from one cluster cannot be accessed from another in CE. Even a restart of the same cluster would not show the databases/tables created before the restart.

Prabakar · ‎10-12-2021

This is a limitation in CE.

dez · ‎01-05-2024

This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.

If there's any dependent objects from previous labs I have to rerun the Notebooks and recreate them, but before I do it I have to drop all files and directories (tables) I had previously created.

Here's a tip if anyone needs:
1. List the Database directory (usually it will be dbfs:/user/hive/datawarehouse/)

%sql

DESCRIBE DATABASE default

2. Get the files Location from the previous output and list the files

f = dbutils.fs.ls('dbfs:/user/hive/warehouse')

display(f)

3. Remove the files and directories so that you'll be able to rerun your Notebook to recreate the dependent tables to continue your labs.

%fs rm -r dbfs:/user/hive/warehouse/<replace_with_a_file_or_directory>/

Now you're good to go on recreating the tables and views.

dhpaulino · ‎06-03-2024

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:

db_name = "mydb"
from pathlib import Path
path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/"
tables_dirs = dbutils.fs.ls(path_db)
for d in tables_dirs:
  table_name = Path(d.path).name  
  spark.sql(f"""CREATE TABLE IF NOT EXISTS {table_name}
            LOCATION '{d.path}'
            """)

Databricks Community

I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April