cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to preserve my database when the cluster is terminated?

daindana
New Contributor III

Whenever my cluster is terminated, I lose my whole database(I'm not sure if it's related, I made those database with delta format. ) And since the cluster is terminated in 2 hours from not using it, I wake up with no database every morning.

I don't want to run code every morning to make whole database again.

Is there any way that I can preserve my database?

I tried to clone cluster, but it didn't make my database back again. Also, I tried to restart the cluster, but it wasn't able to restart it.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

do you happen to use the Community Edition? As apparently there are limitiations concerning own databases.

(https://community.databricks.com/s/feed/0D53f00001HKI7ACAX)

View solution in original post

8 REPLIES 8

Hubert-Dudek
Esteemed Contributor III

Please check where on dbfs database/tables are created. Please check file system is files still there.

Sharing some code regarding creating database and tables could be useful.

Hello, HubertDudek!

Thank you for help and advices!

This is where I think that my database/tables are located:

dbfs:/user/hive/warehouse/db_name.db/table_name/

This is the code that I use to create database:

%sql
CREATE DATABASE IF NOT EXISTS database_name;
USE database_name;

And this is the code that I use to create table:

(df.write
        .format('delta')
        .mode('overwrite')
        .saveAsTable(table_name))

-werners-
Esteemed Contributor III

do you happen to use the Community Edition? As apparently there are limitiations concerning own databases.

(https://community.databricks.com/s/feed/0D53f00001HKI7ACAX)

daindana
New Contributor III

Ahhh yes! I am using community edition! Now I figured that was the reason why! Thank you for helping me

Doris
New Contributor II

So how to work around this? I am a student working on an assignment and I need to finish it, but two hours is not enough time!

-werners-
Esteemed Contributor III

ok how about this: download your files from dbfs to your computer:

https://stackoverflow.com/questions/66685638/databricks-download-a-dbfs-filestore-file-to-my-local-m...

This is not ideal but at least you do not lose your data. When you want to work further on the downloaded files you can upload them again using the UI.

When finished download etc.

Create a table on the files (which is very easy) and you are good to go.

Priyag1
Honored Contributor II

Once if the culstur gets terminated info will be lost

dhpaulino
New Contributor II

 

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:

db_name = "mydb"
from pathlib import Path
path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/"
tables_dirs = dbutils.fs.ls(path_db)
for d in tables_dirs:
  table_name = Path(d.path).name  
  spark.sql(f"""CREATE TABLE IF NOT EXISTS {table_name}
            LOCATION '{d.path}'
            """)

 

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!