Dataframe write to SQL Server table containing Always autogenerate column fails. I am using Apache Spark Connector for SQL Server and Azure SQL. When autogenerate field are not included in dataframe, I encountered - "No key found " error If auto-generate columns are included in dataframe, I encountered "Cannot insert an explicit value into a GENERATED ALWAYS column in table error."
Azure Databricks- 7.6 runtime Azure SQL database Language - PySpark
Exception encountered
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 25, 10.139.64.4, executor 1): com.microsoft.sqlserver.jdbc.SQLServerException: Cannot insert an explicit value into a GENERATED ALWAYS column in table '<
'. Use INSERT with a column list to exclude the GENERATED ALWAYS column, or insert a DEFAULT into GENERATED ALWAYS column.
PySpark code df = read parquet file
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("append") \
.option("url", url) \
.option("dbtable", "TEMPORAL_TABLE") \
.option("user", _username) \
.option("password", _password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")\
.option("schemaCheckEnabled", False)\
.save()
Azure SQL Temporal Table Definition:
CREATE TABLE DBO.TEMPORAL_TABLE(
[UUID] [varchar](255) NOT NULL,
[SERVICE_ID] [bigint] NULL,
[START_DATE] [datetime2](7) NULL,
[END_DATE] [datetime2](7) NULL,
[CHANGED_ON] [datetime2](7) NULL,
[operation] [char](1) NULL,
[SysStartTime] [datetime2](7) GENERATED ALWAYS AS ROW START NOT NULL,
[SysEndTime] [datetime2](7) GENERATED ALWAYS AS ROW END NOT NULL,
CONSTRAINT [PK_TEMPORAL] PRIMARY KEY CLUSTERED
(
[UUID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY],
PERIOD FOR SYSTEM_TIME ([SysStartTime], [SysEndTime])
) ON [PRIMARY]
WITH
(
SYSTEM_VERSIONING = ON ( HISTORY_TABLE = [history].[TEMPORAL_TABLE_HISTORY] )
)
GO
ALTER TABLE DBO.TEMPORAL_TABLE ADD DEFAULT (newid()) FOR [UUID]
GO
ALTER TABLE DBO.TEMPORAL_TABLEADD DEFAULT (getutcdate()) FOR [SysStartTime]
GO
ALTER TABLE DBO.TEMPORAL_TABLE ADD DEFAULT (CONVERT([datetime2],'9999-12-31 23:59:59.9999999')) FOR [SysEndTime]
spark sql connector - https://github.com/microsoft/sql-spark-connector