Databricks Community

shiva12494 · ‎03-14-2023

Hi All, I am exported all tables from postgres snapshot into S3 in parquet format. I am trying to read the table using databricks and i am unable to do so. I get the following error: "Unable to infer schema for Parquet. It must be specified manually." I tried specifying the schema it still wont work. I dint need to specify schema to read parquet files before this so wondering whats different with this, i also tried to copy the parquet file to local and got an error relating to ciphertext.I have attached the error and file name screenshots.Any help is appreciated.

Anonymous · ‎03-24-2023

@shiva charan velichala :

It's possible that the parquet files that you exported from postgres snapshot were encrypted or compressed. If that's the case, you'll need to decrypt and/or decompress the files before you can read them with Databricks.

Additionally, if the schema is not being inferred correctly, you can specify the schema manually using the schema parameter of the read function in Databricks. For example:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
 
my_schema = StructType([
  StructField("column1", StringType(), True),
  StructField("column2", IntegerType(), True),
  ...
])
 
df = spark.read.schema(my_schema).parquet("/path/to/parquet/files")

Replace column1, column2, etc. with the actual column names in your schema.

If you're still having issues, you may want to try opening the parquet files in another program (such as Apache Arrow) to see if you're able to access them there.

Anonymous · ‎03-25-2023

Hi @shiva charan velichala

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks Community

Issue with reading exported tables stored in parquet

li.media.uploader-dialog.title

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April