cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to set properties for a delta table when I want to write a DataFrame?

Mado
Valued Contributor II

Hi,

I have a PySpark DataFrame with 11 million records. I created the DataFrame on a cluster. It is not saved on DBFS or storage account.

import pyspark.sql.functions as F
from pyspark.sql.functions import col, when, floor, expr, hour, minute, to_timestamp, explode, sequence  
 
 
# Define start and end datetime for intervals
start_datetime = "1990-01-01 00:00:00"
end_datetime = "2099-12-31 23:55:00"
interval_minutes = 5   # minutes
 
 
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
    expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
 
 
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
 
 
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
    expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
 
 
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
 
 
# add a new column with interval_period
df = df.withColumn('Interval_Period', interval_period)
df = df.withColumn('Interval_Date', F.to_date(F.col("time")))
 
 
 

 When I want to write in on Storage Account as a Delta table by:

df.write \
 
 .format("delta") \
 
 .mode("overwrite") \
 
 .option("path", "table_path") \
 
 .saveAsTable("my_table")  # External table

And I get the error:

AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. 
Please upgrade your Delta table to reader version 2 and writer version 5
and change the column mapping mode to 'name' mapping. You can use the following command:
 
ALTER TABLE  SET TBLPROPERTIES (
   'delta.columnMapping.mode' = 'name',
   'delta.minReaderVersion' = '2',
   'delta.minWriterVersion' = '5')
Refer to table versioning at https://docs.microsoft.com/azure/databricks/delta/versioning

My question is how I can set those properties. Not that I don't have a delta table. What I have is a DataFrame created on a cluster.

I get the error when I want t write files to the table.

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Mohammad Saber​ , According to my understanding, the issue is complaining about some of the columns in the delta table(where you are writing to) having ';{}()\n\t=' in the column names. You need to change the reader and writer versions of the delta table to be able to 2 and 5 respectively using below command

ALTER TABLE SET TBLPROPERTIES (

'delta.columnMapping.mode' = 'name',

'delta.minReaderVersion' = '2',

'delta.minWriterVersion' = '5')

View solution in original post

4 REPLIES 4

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Mohammad Saber​ , Are you getting the error while writing the file to the table? Or before that?

Mado
Valued Contributor II

Hi @Lakshay Goel​ 

I get the error when I want to write files to the table.

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Mohammad Saber​ , According to my understanding, the issue is complaining about some of the columns in the delta table(where you are writing to) having ';{}()\n\t=' in the column names. You need to change the reader and writer versions of the delta table to be able to 2 and 5 respectively using below command

ALTER TABLE SET TBLPROPERTIES (

'delta.columnMapping.mode' = 'name',

'delta.minReaderVersion' = '2',

'delta.minWriterVersion' = '5')

Mado
Valued Contributor II

Thanks,

I found that I should create a table first using the above properties, and then I write Dataframe to the table.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!