Create Databricks tables dynamically

labromb
Databricks Partner

Hi,

I would like to be able to do something like this...

create table if not exists table1

using parquet

location = '/mnt/somelocation

some location needs to be a concatenation of static and code generated string. Documentation suggests that location only accepts a string literal and based on my attempts to do this, seems to be the case. Is there any way round this?

Thanks in advance

Pat
Esteemed Contributor

Hi @Brian Labrom​ ,

One you can do it with Spark.sql

%python
tables = ["table1", "table2"]
for table in tables:
  spark.sql(f"""
  create table if not exists {table}
  using parquet
  location = '/mnt/{table}'""")

or maybe with sql:

create table if not exists table1
using parquet
location = '/mnt/${somelocation}'

You can use widget for example to get somelocation:

https://docs.databricks.com/notebooks/widgets.html

View solution in original post

PrasanthM
New Contributor III

FString Python can be used. example > spark.sql(f"CREATE TABLE {table_name} (id INT, name STRING, value DOUBLE, state STRING)")

labromb
Databricks Partner

Hi, I will check the suggestions this morning and share the results. Thanks @Pat Sienkiewicz​ and @Prasanth Mathesh​ for the suggestions

labromb
Databricks Partner

The end solution looked like this

first compute the variable part of the location (one month back from today in my case)

#first compute the archive date (or whatever else you need to be dynamic)
 
import datetime
 
from dateutil.relativedelta import relativedelta
 
start_date = datetime.date.today()
 
archive_date = start_date - relativedelta(months=1)
 
archive_date = archive_date.strftime('%Y-%m')
 
#build the table using the previously computed variable 
 
import pyspark.sql.session
 
folder = 'container/folder'+archive_date+'.parquet'
 
spark.sql(f"CREATE TABLE Archive using parquet location '/mnt/{folder}'")

couldn't get the SQL approach to work for some reason, but this is absolutely fine for what I need.

Many thanks for your help, this will prove v useful in the future.

As an observation might it be worth updating Databricks documentation as it currently states location must be a string literal? Just a thought.