- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2021 12:05 AM
I am using Databricks community edition for learning purposes.
I created some Hive-managed tables through spark sql as well as with df.saveAsTable options.
But when I connect to a new cluser,"Show databases"
only returns the default database.
The database and tables I created with the previous cluster are not shown.However when I run
spark.conf.get("spark.sql.warehouse.dir")
it returns
'/user/hive/warehouse'
and When I check the dbfs directory
%fs ls dbfs:/user/hive/warehouse
I see the databases and tables I created in other sessions. Am I missing anything or sparkSQL doesn't read the hive managed tables created in other clusters in commnunity edition?
- Labels:
-
Spark sql
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2021 12:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2021 02:09 AM
can you see the database and tables in the web UI, "Data" in the menu on the left?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2021 02:21 AM
I cannot see databases and tables when no cluster is attached and when I attach a new cluster, I see only the default database.
But I can see previously created databases and tables using
%fs ls /user/hive/warehouse command.
It is not showing up in the data menu or using sparkSQL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2021 02:38 AM
Yes, but you mentioned you did thus by using the "show databases" command in a notebook.
I ask you to check using the Databricks Web UI.It could be that Community Edition does not have this functionality though.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-24-2021 07:21 PM
I have the same problem, so it's the feather of CE ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2021 03:59 AM
Hello,
Yes, the databases created from one cluster cannot be accessed from another in CE. Even a restart of the same cluster would not show the databases/tables created before the restart.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2021 12:04 AM
This is a limitation in CE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2024 02:33 AM
This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.
If there's any dependent objects from previous labs I have to rerun the Notebooks and recreate them, but before I do it I have to drop all files and directories (tables) I had previously created.
Here's a tip if anyone needs:
1. List the Database directory (usually it will be dbfs:/user/hive/datawarehouse/)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-03-2024 11:24 AM
As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:
db_name = "mydb"
from pathlib import Path
path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/"
tables_dirs = dbutils.fs.ls(path_db)
for d in tables_dirs:
table_name = Path(d.path).name
spark.sql(f"""CREATE TABLE IF NOT EXISTS {table_name}
LOCATION '{d.path}'
""")

