<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pandas finds parquet file, Spark does not in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/56391#M30537</link>
    <description>&lt;P&gt;Are you getting any error messages? what happens when you do a "ls /dbfs/"? are you able to list all the parquet files?&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jan 2024 22:30:23 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2024-01-03T22:30:23Z</dc:date>
    <item>
      <title>Pandas finds parquet file, Spark does not</title>
      <link>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/55936#M30470</link>
      <description>&lt;P&gt;I am having an issue with Databricks (Community Edition) where I can use Pandas to read a parquet file into a dataframe, but when I use Spark it states the file doesn't exist. I have tried reformatting the file path for spark but I can't seem to find a format that it will accept.&lt;/P&gt;&lt;P&gt;Any ideas?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Pandas:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;parquet_file_path = "/dbfs/green_tripdata_2022-02.parquet"
df = pd.read_parquet(parquet_file_path, engine='pyarrow')
display(df)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Result:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JonW_1-1703880035484.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5685iFA1926E6AD07E00B/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JonW_1-1703880035484.png" alt="JonW_1-1703880035484.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Spark:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;parquet_file_path = "/dbfs/green_tripdata_2022-02.parquet"
df = spark.read.parquet(parquet_file_path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;STRONG&gt;Result:&lt;BR /&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;AnalysisException: [&lt;A class="" href="https://docs.databricks.com/error-messages/error-classes.html#path_not_found" target="_blank" rel="noopener noreferrer"&gt;PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/green_tripdata_2022-02.parquet.&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 29 Dec 2023 20:02:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/55936#M30470</guid>
      <dc:creator>JonW</dc:creator>
      <dc:date>2023-12-29T20:02:57Z</dc:date>
    </item>
    <item>
      <title>Re: Pandas finds parquet file, Spark does not</title>
      <link>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/55939#M30471</link>
      <description>&lt;P&gt;Can you check those 3 options ? I don't remember which one will work and can't test it now, but I am sore one or two of those will work &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;parquet_file_path = "/green_tripdata_2022-02.parquet"&lt;/P&gt;&lt;P&gt;parquet_file_path = "green_tripdata_2022-02.parquet"&lt;/P&gt;&lt;P&gt;parquet_file_path = "dbfs:/green_tripdata_2022-02.parquet"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Dec 2023 23:02:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/55939#M30471</guid>
      <dc:creator>Wojciech_BUK</dc:creator>
      <dc:date>2023-12-29T23:02:43Z</dc:date>
    </item>
    <item>
      <title>Re: Pandas finds parquet file, Spark does not</title>
      <link>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/56391#M30537</link>
      <description>&lt;P&gt;Are you getting any error messages? what happens when you do a "ls /dbfs/"? are you able to list all the parquet files?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jan 2024 22:30:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pandas-finds-parquet-file-spark-does-not/m-p/56391#M30537</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2024-01-03T22:30:23Z</dc:date>
    </item>
  </channel>
</rss>

