cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to set properties for a delta table when I want to write a DataFrame?

Mado
Valued Contributor II

Hi,

I have a PySpark DataFrame with 11 million records. I created the DataFrame on a cluster. It is not saved on DBFS or storage account.

import pyspark.sql.functions as F
from pyspark.sql.functions import col, when, floor, expr, hour, minute, to_timestamp, explode, sequence  
 
 
# Define start and end datetime for intervals
start_datetime = "1990-01-01 00:00:00"
end_datetime = "2099-12-31 23:55:00"
interval_minutes = 5   # minutes
 
 
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
    expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
 
 
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
 
 
# Create a DataFrame with sequence of intervals
df = spark.range(0, 1).select(
    expr(f"sequence(to_timestamp('{start_datetime}'), to_timestamp('{end_datetime}'), interval {interval_minutes} minutes)").alias("time")
)
 
 
# Explode the sequence column to create individual intervals
df = df.select(explode(col("time")).alias("time"))
 
 
# add a new column with interval_period
df = df.withColumn('Interval_Period', interval_period)
df = df.withColumn('Interval_Date', F.to_date(F.col("time")))
 
 
 

 When I want to write in on Storage Account as a Delta table by:

df.write \
 
 .format("delta") \
 
 .mode("overwrite") \
 
 .option("path", "table_path") \
 
 .saveAsTable("my_table")  # External table

And I get the error:

AnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. 
Please upgrade your Delta table to reader version 2 and writer version 5
and change the column mapping mode to 'name' mapping. You can use the following command:
 
ALTER TABLE  SET TBLPROPERTIES (
   'delta.columnMapping.mode' = 'name',
   'delta.minReaderVersion' = '2',
   'delta.minWriterVersion' = '5')
Refer to table versioning at https://docs.microsoft.com/azure/databricks/delta/versioning

My question is how I can set those properties. Not that I don't have a delta table. What I have is a DataFrame created on a cluster.

I get the error when I want t write files to the table.

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Databricks Employee
Databricks Employee

Hi @Mohammad Saber​ , According to my understanding, the issue is complaining about some of the columns in the delta table(where you are writing to) having ';{}()\n\t=' in the column names. You need to change the reader and writer versions of the delta table to be able to 2 and 5 respectively using below command

ALTER TABLE SET TBLPROPERTIES (

'delta.columnMapping.mode' = 'name',

'delta.minReaderVersion' = '2',

'delta.minWriterVersion' = '5')

View solution in original post

4 REPLIES 4

Lakshay
Databricks Employee
Databricks Employee

Hi @Mohammad Saber​ , Are you getting the error while writing the file to the table? Or before that?

Mado
Valued Contributor II

Hi @Lakshay Goel​ 

I get the error when I want to write files to the table.

Lakshay
Databricks Employee
Databricks Employee

Hi @Mohammad Saber​ , According to my understanding, the issue is complaining about some of the columns in the delta table(where you are writing to) having ';{}()\n\t=' in the column names. You need to change the reader and writer versions of the delta table to be able to 2 and 5 respectively using below command

ALTER TABLE SET TBLPROPERTIES (

'delta.columnMapping.mode' = 'name',

'delta.minReaderVersion' = '2',

'delta.minWriterVersion' = '5')

Mado
Valued Contributor II

Thanks,

I found that I should create a table first using the above properties, and then I write Dataframe to the table.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group