<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issue with reading exported tables stored in parquet in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7747#M3523</link>
    <description>&lt;P&gt;@shiva charan velichala​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It's possible that the parquet files that you exported from postgres snapshot were encrypted or compressed. If that's the case, you'll need to decrypt and/or decompress the files before you can read them with Databricks.&lt;/P&gt;&lt;P&gt;Additionally, if the schema is not being inferred correctly, you can specify the schema manually using the schema parameter of the read function in Databricks. For example:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.types import StructType, StructField, StringType, IntegerType
&amp;nbsp;
my_schema = StructType([
  StructField("column1", StringType(), True),
  StructField("column2", IntegerType(), True),
  ...
])
&amp;nbsp;
df = spark.read.schema(my_schema).parquet("/path/to/parquet/files")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Replace column1, column2, etc. with the actual column names in your schema.&lt;/P&gt;&lt;P&gt;If you're still having issues, you may want to try opening the parquet files in another program (such as Apache Arrow) to see if you're able to access them there.&lt;/P&gt;</description>
    <pubDate>Sat, 25 Mar 2023 06:43:37 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-03-25T06:43:37Z</dc:date>
    <item>
      <title>Issue with reading exported tables stored in parquet</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7746#M3522</link>
      <description>&lt;P&gt;Hi All, I am exported all tables from postgres snapshot into S3 in parquet format. I am trying to read the table using databricks and i am unable to do so. I get the following error: "Unable to infer schema for Parquet. It must be specified manually." I tried specifying the schema it still wont work. I dint need to specify schema to read parquet files before this so wondering whats different with this, i also tried to copy the parquet file to local and got an error relating to ciphertext.I have attached the error and file name screenshots.Any help is appreciated.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Mar 2023 15:44:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7746#M3522</guid>
      <dc:creator>shiva12494</dc:creator>
      <dc:date>2023-03-14T15:44:46Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with reading exported tables stored in parquet</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7747#M3523</link>
      <description>&lt;P&gt;@shiva charan velichala​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It's possible that the parquet files that you exported from postgres snapshot were encrypted or compressed. If that's the case, you'll need to decrypt and/or decompress the files before you can read them with Databricks.&lt;/P&gt;&lt;P&gt;Additionally, if the schema is not being inferred correctly, you can specify the schema manually using the schema parameter of the read function in Databricks. For example:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql.types import StructType, StructField, StringType, IntegerType
&amp;nbsp;
my_schema = StructType([
  StructField("column1", StringType(), True),
  StructField("column2", IntegerType(), True),
  ...
])
&amp;nbsp;
df = spark.read.schema(my_schema).parquet("/path/to/parquet/files")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Replace column1, column2, etc. with the actual column names in your schema.&lt;/P&gt;&lt;P&gt;If you're still having issues, you may want to try opening the parquet files in another program (such as Apache Arrow) to see if you're able to access them there.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Mar 2023 06:43:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7747#M3523</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-25T06:43:37Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with reading exported tables stored in parquet</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7748#M3524</link>
      <description>&lt;P&gt;Hi @shiva charan velichala​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Mar 2023 10:43:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-reading-exported-tables-stored-in-parquet/m-p/7748#M3524</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-25T10:43:08Z</dc:date>
    </item>
  </channel>
</rss>

