cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

AnalysisException : when attempting to save a spark DataFrame as delta table

kll
New Contributor III

I get an, 

`AnalysisException Failed to merge incompatible data types LongType and StringType

when attempting to run the below command, 

`df.write.format("delta").saveAsTable("schema.k_adhoc.df", mode="overwrite")` 

I am casting the column before saving: 

```

# Cast datatype

df = df.withColumn("h_id", df4["h_id"].cast(StringType()))
df.show()
```

The schema of the `df` dataframe is: 

```

|-- id: string (nullable = false)

|-- year: integer (nullable = true)

|-- month: integer (nullable = true)

|-- month_year: string (nullable = true)

|-- mm-yyyy: date (nullable = true)

|-- h_id: string (nullable = true)

```

 

```

--------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) File <command-3186243284832184>:2 1 # Save Table ----> 2 df.write.format("delta").saveAsTable("schema.k_adhoc.df", mode="overwrite") File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs) 46 start = time.perf_counter() 47 try: ---> 48 res = func(*args, **kwargs) 49 logger.log_success( 50 module_name, class_name, function_name, time.perf_counter() - start, signature 51 ) 52 return res File /databricks/spark/python/pyspark/sql/readwriter.py:1520, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options) 1518 if format is not None: 1519 self.format(format) -> 1520 self._jwrite.saveAsTable(name) File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args) 1315 command = proto.CALL_COMMAND_NAME +\ 1316 self.command_header +\ 1317 args_command +\ 1318 proto.END_COMMAND_PART 1320 answer = self.gateway_client.send_command(command) -> 1321 return_value = get_return_value( 1322 answer, self.gateway_client, self.target_id, self.name) 1324 for temp_arg in temp_args: 1325 temp_arg._detach() File /databricks/spark/python/pyspark/errors/exceptions.py:234, in capture_sql_exception.<locals>.deco(*a, **kw) 230 converted = convert_exception(e.java_exception) 231 if not isinstance(converted, UnknownException): 232 # Hide where the exception came from that shows a non-Pythonic 233 # JVM exception message. --> 234 raise converted from None 235 else: 236 raise AnalysisException: Failed to merge fields 'h_id' and 'h_id'. Failed to merge incompatible data types LongType and StringType

```

5 REPLIES 5

Priyanka_Biswas
Valued Contributor
Valued Contributor

Hi @kll

This error is because of schema validation. The delta table performs schema validation of every column, and the source dataframe column data types must match the column data types in the target table. If they don’t match, an exception is raised.

For reference-

https://docs.databricks.com/delta/delta-batch.html#schema-validation-1


Could you please refer below 2 article and see if it helps ?
  1. https://community.databricks.com/s/question/0D58Y00008k0YO8SAM/failed-to-merge-incompatible-data-typ...
  2. https://kb.databricks.com/en_US/data/wrong-schema-in-files

Ajay-Pandey
Esteemed Contributor III

Hi @kll 

First, drop your table and delete the underlying files by using dbutils.fs.rm('path') and then try to run the write command then it will work.

Anonymous
Not applicable

Hi @kll 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

dream
New Contributor III

Use the overwrite schema option:

(df.write
    .mode('overwrite')
    .format('delta')
    .option('overwriteSchema', 'true')
    .saveAsTable('schema.k_adhoc.df'))

 

Lakshay
Esteemed Contributor
Esteemed Contributor

The issue seems to be because the job is trying to merge columns with different schema. Could you please make sure that the schema matches for the columns.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.