08-22-2021 12:05 AM
I am using Databricks community edition for learning purposes.
I created some Hive-managed tables through spark sql as well as with df.saveAsTable options.
But when I connect to a new cluser,"Show databases"
only returns the default database.
The database and tables I created with the previous cluster are not shown.However when I run
spark.conf.get("spark.sql.warehouse.dir")
it returns
'/user/hive/warehouse'
and When I check the dbfs directory
%fs ls dbfs:/user/hive/warehouse
I see the databases and tables I created in other sessions. Am I missing anything or sparkSQL doesn't read the hive managed tables created in other clusters in commnunity edition?
10-12-2021 12:04 AM
08-23-2021 02:09 AM
can you see the database and tables in the web UI, "Data" in the menu on the left?
08-23-2021 02:21 AM
I cannot see databases and tables when no cluster is attached and when I attach a new cluster, I see only the default database.
But I can see previously created databases and tables using
%fs ls /user/hive/warehouse command.
It is not showing up in the data menu or using sparkSQL.
08-23-2021 02:38 AM
Yes, but you mentioned you did thus by using the "show databases" command in a notebook.
I ask you to check using the Databricks Web UI.It could be that Community Edition does not have this functionality though.09-24-2021 07:21 PM
I have the same problem, so it's the feather of CE ?
10-07-2021 03:59 AM
Hello,
Yes, the databases created from one cluster cannot be accessed from another in CE. Even a restart of the same cluster would not show the databases/tables created before the restart.
10-12-2021 12:04 AM
This is a limitation in CE.
01-05-2024 02:33 AM
This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.
If there's any dependent objects from previous labs I have to rerun the Notebooks and recreate them, but before I do it I have to drop all files and directories (tables) I had previously created.
Here's a tip if anyone needs:
1. List the Database directory (usually it will be dbfs:/user/hive/datawarehouse/)
06-03-2024 11:24 AM
As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:
db_name = "mydb"
from pathlib import Path
path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/"
tables_dirs = dbutils.fs.ls(path_db)
for d in tables_dirs:
table_name = Path(d.path).name
spark.sql(f"""CREATE TABLE IF NOT EXISTS {table_name}
LOCATION '{d.path}'
""")
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group