cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Issue with reading exported tables stored in parquet

shiva12494
New Contributor II

Hi All, I am exported all tables from postgres snapshot into S3 in parquet format. I am trying to read the table using databricks and i am unable to do so. I get the following error: "Unable to infer schema for Parquet. It must be specified manually." I tried specifying the schema it still wont work. I dint need to specify schema to read parquet files before this so wondering whats different with this, i also tried to copy the parquet file to local and got an error relating to ciphertext.I have attached the error and file name screenshots.Any help is appreciated.

2 REPLIES 2

Anonymous
Not applicable

@shiva charan velichala​ :

It's possible that the parquet files that you exported from postgres snapshot were encrypted or compressed. If that's the case, you'll need to decrypt and/or decompress the files before you can read them with Databricks.

Additionally, if the schema is not being inferred correctly, you can specify the schema manually using the schema parameter of the read function in Databricks. For example:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
 
my_schema = StructType([
  StructField("column1", StringType(), True),
  StructField("column2", IntegerType(), True),
  ...
])
 
df = spark.read.schema(my_schema).parquet("/path/to/parquet/files")

Replace column1, column2, etc. with the actual column names in your schema.

If you're still having issues, you may want to try opening the parquet files in another program (such as Apache Arrow) to see if you're able to access them there.

Anonymous
Not applicable

Hi @shiva charan velichala​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.