cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

cannot convert Parquet type INT64 to Photon type string

JLSy
New Contributor III

I am receiving an error similar to the post in this link: https://community.databricks.com/s/question/0D58Y00009d8h4tSAA/cannot-convert-parquet-type-int64-to-...

However, instead of type double the error message states that the type cannot be converted into string.

In short, I am trying to mount data from our S3 bucket into a Databricks workspace instance via the spark.read and spark.write method but encounter the error message "Error while reading file: Schema conversion error: cannot convert Parquet type INT64 to Photon type string"

I have tried the spark cluster configuration stated in the post mentioned but it does not solve my current issue. I was wondering if a similar configuration is needed (one with only a small edit to the previous solution) or if some other solution is available that would also be good

5 REPLIES 5

pvignesh92
Honored Contributor

Hi @John Laurence Syโ€‹ Could you clarify if the parquet files you are reading has different datatype for the same column? I'm wondering why Spark is trying to convert the schema from INT to String?

JLSy
New Contributor III

Hello @Vigneshraja Palanirajโ€‹, I have verified that all the columns are assigned only one of the following types: string, date, double, bigint, decimal(20,2), int.

Anonymous
Not applicable

@John Laurence Syโ€‹ :

It sounds like you are encountering a schema conversion error when trying to read in a Parquet file that contains an INT64 column that cannot be converted to a string type. This error can occur when the Parquet file has a schema that is incompatible with the expected schema of the Spark DataFrame.

One possible solution is to explicitly specify the schema of the Parquet file when reading it into a Spark DataFrame using the schema parameter of the spark.read.parquet() method. This will ensure that the Parquet file is read in with the correct schema and any type conversion errors are avoided. For example:

python

from pyspark.sql.types import StructType, StructField, LongType, StringType
 
# Define the schema of the Parquet file
schema = StructType([
    StructField("int_column", LongType(), True),
    StructField("string_column", StringType(), True)
])
 
# Read in the Parquet file with the specified schema
df = spark.read.parquet("s3://path/to/parquet/file", schema=schema)

In this example, the schema of the Parquet file contains an INT64 column and a string column, which are explicitly defined using the StructType and StructField classes. The LongType() and StringType() functions are used to define the data types of the columns.

Alternatively, you can try converting the INT64 column to a string column in the Parquet file itself before reading it into Spark. This can be done using tools like Apache Arrow or Pandas. Once the column is converted, the Parquet file can be read in normally without encountering any schema conversion errors.

I hope this helps! Let me know if you have any further questions.

JLSy
New Contributor III

Hello @Suteja Kanuriโ€‹ ,

I'll go ahead and implement this method, thanks! I'll update this thread if there are any issues.

JLSy
New Contributor III

I have tried specifying the schema and assigning the following mapping to each column type:

  • string - StringType()
  • date - DateType()
  • double - DoubleType()
  • bigint - LongType()
  • int - LongType()
  • decimal(20,2) - LongType()

I have also tried using other spark types for the decimal(20,2), int, bigint, and double columns, however, the error still persists.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group