cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Create Databricks tables dynamically

labromb
Contributor

Hi,

I would like to be able to do something like this...

create table if not exists table1

using parquet

location = '/mnt/somelocation

some location needs to be a concatenation of static and code generated string. Documentation suggests that location only accepts a string literal and based on my attempts to do this, seems to be the case. Is there any way round this?

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Pat
Honored Contributor III

Hi @Brian Labromโ€‹ ,

One you can do it with Spark.sql

%python
tables = ["table1", "table2"]
for table in tables:
  spark.sql(f"""
  create table if not exists {table}
  using parquet
  location = '/mnt/{table}'""")

or maybe with sql:

create table if not exists table1
using parquet
location = '/mnt/${somelocation}'

You can use widget for example to get somelocation:

https://docs.databricks.com/notebooks/widgets.html

View solution in original post

4 REPLIES 4

Pat
Honored Contributor III

Hi @Brian Labromโ€‹ ,

One you can do it with Spark.sql

%python
tables = ["table1", "table2"]
for table in tables:
  spark.sql(f"""
  create table if not exists {table}
  using parquet
  location = '/mnt/{table}'""")

or maybe with sql:

create table if not exists table1
using parquet
location = '/mnt/${somelocation}'

You can use widget for example to get somelocation:

https://docs.databricks.com/notebooks/widgets.html

PrasanthM
New Contributor III

FString Python can be used. example > spark.sql(f"CREATE TABLE {table_name} (id INT, name STRING, value DOUBLE, state STRING)")

Hi, I will check the suggestions this morning and share the results. Thanks @Pat Sienkiewiczโ€‹ and @Prasanth Matheshโ€‹ for the suggestions

The end solution looked like this

first compute the variable part of the location (one month back from today in my case)

#first compute the archive date (or whatever else you need to be dynamic)
 
import datetime
 
from dateutil.relativedelta import relativedelta
 
start_date = datetime.date.today()
 
archive_date = start_date - relativedelta(months=1)
 
archive_date = archive_date.strftime('%Y-%m')
 
#build the table using the previously computed variable 
 
import pyspark.sql.session
 
folder = 'container/folder'+archive_date+'.parquet'
 
spark.sql(f"CREATE TABLE Archive using parquet location '/mnt/{folder}'")

couldn't get the SQL approach to work for some reason, but this is absolutely fine for what I need.

Many thanks for your help, this will prove v useful in the future.

As an observation might it be worth updating Databricks documentation as it currently states location must be a string literal? Just a thought.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group