cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

AnalysisException : when attempting to save a spark DataFrame as delta table

kll
New Contributor III

I get an, 

`AnalysisException Failed to merge incompatible data types LongType and StringType

when attempting to run the below command, 

`df.write.format("delta").saveAsTable("schema.k_adhoc.df", mode="overwrite")` 

I am casting the column before saving: 

```

# Cast datatype

df = df.withColumn("h_id", df4["h_id"].cast(StringType()))
df.show()
```

The schema of the `df` dataframe is: 

```

|-- id: string (nullable = false)

|-- year: integer (nullable = true)

|-- month: integer (nullable = true)

|-- month_year: string (nullable = true)

|-- mm-yyyy: date (nullable = true)

|-- h_id: string (nullable = true)

```

 

```

--------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) File <command-3186243284832184>:2 1 # Save Table ----> 2 df.write.format("delta").saveAsTable("schema.k_adhoc.df", mode="overwrite") File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs) 46 start = time.perf_counter() 47 try: ---> 48 res = func(*args, **kwargs) 49 logger.log_success( 50 module_name, class_name, function_name, time.perf_counter() - start, signature 51 ) 52 return res File /databricks/spark/python/pyspark/sql/readwriter.py:1520, in DataFrameWriter.saveAsTable(self, name, format, mode, partitionBy, **options) 1518 if format is not None: 1519 self.format(format) -> 1520 self._jwrite.saveAsTable(name) File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args) 1315 command = proto.CALL_COMMAND_NAME +\ 1316 self.command_header +\ 1317 args_command +\ 1318 proto.END_COMMAND_PART 1320 answer = self.gateway_client.send_command(command) -> 1321 return_value = get_return_value( 1322 answer, self.gateway_client, self.target_id, self.name) 1324 for temp_arg in temp_args: 1325 temp_arg._detach() File /databricks/spark/python/pyspark/errors/exceptions.py:234, in capture_sql_exception.<locals>.deco(*a, **kw) 230 converted = convert_exception(e.java_exception) 231 if not isinstance(converted, UnknownException): 232 # Hide where the exception came from that shows a non-Pythonic 233 # JVM exception message. --> 234 raise converted from None 235 else: 236 raise AnalysisException: Failed to merge fields 'h_id' and 'h_id'. Failed to merge incompatible data types LongType and StringType

```

5 REPLIES 5

Priyanka_Biswas
Valued Contributor
Valued Contributor

Hi @kll

This error is because of schema validation. The delta table performs schema validation of every column, and the source dataframe column data types must match the column data types in the target table. If they don’t match, an exception is raised.

For reference-

https://docs.databricks.com/delta/delta-batch.html#schema-validation-1


Could you please refer below 2 article and see if it helps ?
  1. https://community.databricks.com/s/question/0D58Y00008k0YO8SAM/failed-to-merge-incompatible-data-typ...
  2. https://kb.databricks.com/en_US/data/wrong-schema-in-files

Ajay-Pandey
Esteemed Contributor III

Hi @kll 

First, drop your table and delete the underlying files by using dbutils.fs.rm('path') and then try to run the write command then it will work.

Ajay Kumar Pandey

Anonymous
Not applicable

Hi @kll 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

dream
Contributor

Use the overwrite schema option:

(df.write
    .mode('overwrite')
    .format('delta')
    .option('overwriteSchema', 'true')
    .saveAsTable('schema.k_adhoc.df'))

 

Lakshay
Esteemed Contributor
Esteemed Contributor

The issue seems to be because the job is trying to merge columns with different schema. Could you please make sure that the schema matches for the columns.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group