topic Re: Issue with reading exported tables stored in parquet in Data Engineering

Issue with reading exported tables stored in parquet

shiva12494 — Tue, 14 Mar 2023 15:44:46 GMT

Hi All, I am exported all tables from postgres snapshot into S3 in parquet format. I am trying to read the table using databricks and i am unable to do so. I get the following error: "Unable to infer schema for Parquet. It must be specified manually." I tried specifying the schema it still wont work. I dint need to specify schema to read parquet files before this so wondering whats different with this, i also tried to copy the parquet file to local and got an error relating to ciphertext.I have attached the error and file name screenshots.Any help is appreciated.

Re: Issue with reading exported tables stored in parquet

Anonymous — Sat, 25 Mar 2023 06:43:37 GMT

@shiva charan velichala :

It's possible that the parquet files that you exported from postgres snapshot were encrypted or compressed. If that's the case, you'll need to decrypt and/or decompress the files before you can read them with Databricks.

Additionally, if the schema is not being inferred correctly, you can specify the schema manually using the schema parameter of the read function in Databricks. For example:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
 
my_schema = StructType([
  StructField("column1", StringType(), True),
  StructField("column2", IntegerType(), True),
  ...
])
 
df = spark.read.schema(my_schema).parquet("/path/to/parquet/files")

Replace column1, column2, etc. with the actual column names in your schema.

If you're still having issues, you may want to try opening the parquet files in another program (such as Apache Arrow) to see if you're able to access them there.

Re: Issue with reading exported tables stored in parquet

Anonymous — Sat, 25 Mar 2023 10:43:08 GMT

Hi @shiva charan velichala

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!