cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

running a query against multiple parquet files from a folder

Shaimaa
New Contributor II

I am runninng a query against multiple parquet files:

SELECT
SUM(CASE WHEN match_result.year_incorporated IS NOT NULL AND match_result.year_incorporated != '' THEN 1 ELSE 0 END)
FROM 
parquet.`s3://folder_path/*`

for some files, the field `year_incorporated` has a string value, and for some of files the entire field is null. I am getting this error for the file with all null values:

Error while reading file s3://file_path.PARQUET. Schema conversion error: cannot convert Parquet type INT32 to Photon type string(0)

How can I fix this issue?

1 REPLY 1

daniel_sahal
Esteemed Contributor

@Shaimaa 
The column type mismatch between the files could be an issue here.
For example: if in one file column 'xyz' is a type of INTEGER and in another one the same column is a type of STRING, Spark will give you a schema conversion error.
Below is a link for a good article that explains the issue a little bit more, however the best solution would be to fix the column types in the source files or by changing the file format.


https://medium.com/data-arena/merging-different-schemas-in-apache-spark-2a9caca2c5ce

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group