cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Create Databricks tables dynamically

labromb
Contributor

Hi,

I would like to be able to do something like this...

create table if not exists table1

using parquet

location = '/mnt/somelocation

some location needs to be a concatenation of static and code generated string. Documentation suggests that location only accepts a string literal and based on my attempts to do this, seems to be the case. Is there any way round this?

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Pat
Honored Contributor III

Hi @Brian Labrom​ ,

One you can do it with Spark.sql

%python
tables = ["table1", "table2"]
for table in tables:
  spark.sql(f"""
  create table if not exists {table}
  using parquet
  location = '/mnt/{table}'""")

or maybe with sql:

create table if not exists table1
using parquet
location = '/mnt/${somelocation}'

You can use widget for example to get somelocation:

https://docs.databricks.com/notebooks/widgets.html

View solution in original post

6 REPLIES 6

Pat
Honored Contributor III

Hi @Brian Labrom​ ,

One you can do it with Spark.sql

%python
tables = ["table1", "table2"]
for table in tables:
  spark.sql(f"""
  create table if not exists {table}
  using parquet
  location = '/mnt/{table}'""")

or maybe with sql:

create table if not exists table1
using parquet
location = '/mnt/${somelocation}'

You can use widget for example to get somelocation:

https://docs.databricks.com/notebooks/widgets.html

PrasanthM
New Contributor III

FString Python can be used. example > spark.sql(f"CREATE TABLE {table_name} (id INT, name STRING, value DOUBLE, state STRING)")

Kaniz
Community Manager
Community Manager

Hi @Brian Labrom​ ​, We haven’t heard from you since the last response from @Prasanth Mathesh​ and @Pat Sienkiewicz​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community, as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Hi, I will check the suggestions this morning and share the results. Thanks @Pat Sienkiewicz​ and @Prasanth Mathesh​ for the suggestions

Kaniz
Community Manager
Community Manager

Thank you for your quick response @Brian Labrom​ !

Please keep us posted. We would love for you to have the best answer as soon as possible.

The end solution looked like this

first compute the variable part of the location (one month back from today in my case)

#first compute the archive date (or whatever else you need to be dynamic)
 
import datetime
 
from dateutil.relativedelta import relativedelta
 
start_date = datetime.date.today()
 
archive_date = start_date - relativedelta(months=1)
 
archive_date = archive_date.strftime('%Y-%m')
 
#build the table using the previously computed variable 
 
import pyspark.sql.session
 
folder = 'container/folder'+archive_date+'.parquet'
 
spark.sql(f"CREATE TABLE Archive using parquet location '/mnt/{folder}'")

couldn't get the SQL approach to work for some reason, but this is absolutely fine for what I need.

Many thanks for your help, this will prove v useful in the future.

As an observation might it be worth updating Databricks documentation as it currently states location must be a string literal? Just a thought.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.