topic Re: Create Databricks tables dynamically in Data Engineering

Create Databricks tables dynamically

labromb — Thu, 03 Nov 2022 15:17:54 GMT

Hi,

I would like to be able to do something like this...

create table if not exists table1

using parquet

location = '/mnt/somelocation

some location needs to be a concatenation of static and code generated string. Documentation suggests that location only accepts a string literal and based on my attempts to do this, seems to be the case. Is there any way round this?

Thanks in advance

Re: Create Databricks tables dynamically

Pat — Thu, 03 Nov 2022 21:09:46 GMT

Hi @Brian Labrom ,

One you can do it with Spark.sql

%python
tables = ["table1", "table2"]
for table in tables:
  spark.sql(f"""
  create table if not exists {table}
  using parquet
  location = '/mnt/{table}'""")

or maybe with sql:

create table if not exists table1
using parquet
location = '/mnt/${somelocation}'

You can use widget for example to get somelocation:

https://docs.databricks.com/notebooks/widgets.html

Re: Create Databricks tables dynamically

PrasanthM — Fri, 04 Nov 2022 02:03:13 GMT

FString Python can be used. example > spark.sql(f"CREATE TABLE {table_name} (id INT, name STRING, value DOUBLE, state STRING)")

Re: Create Databricks tables dynamically

labromb — Fri, 04 Nov 2022 09:53:41 GMT

Hi, I will check the suggestions this morning and share the results. Thanks @Pat Sienkiewicz and @Prasanth Mathesh for the suggestions

Re: Create Databricks tables dynamically

labromb — Mon, 07 Nov 2022 11:56:09 GMT

The end solution looked like this

first compute the variable part of the location (one month back from today in my case)

#first compute the archive date (or whatever else you need to be dynamic)
 
import datetime
 
from dateutil.relativedelta import relativedelta
 
start_date = datetime.date.today()
 
archive_date = start_date - relativedelta(months=1)
 
archive_date = archive_date.strftime('%Y-%m')
 
#build the table using the previously computed variable 
 
import pyspark.sql.session
 
folder = 'container/folder'+archive_date+'.parquet'
 
spark.sql(f"CREATE TABLE Archive using parquet location '/mnt/{folder}'")

couldn't get the SQL approach to work for some reason, but this is absolutely fine for what I need.

Many thanks for your help, this will prove v useful in the future.

As an observation might it be worth updating Databricks documentation as it currently states location must be a string literal? Just a thought.