<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Problems with pandas.read_parquet() and path in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19664#M13210</link>
    <description>&lt;P&gt;Hey @S S​&amp;nbsp; ,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can understand your issue &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;so to solve this import that DBC file and instead of question one there will be a folder for all solutions so explore solution one it will work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please upvote if you got some hint from my answer&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aviral Bhardwaj&lt;/P&gt;</description>
    <pubDate>Fri, 16 Dec 2022 15:04:30 GMT</pubDate>
    <dc:creator>Aviral-Bhardwaj</dc:creator>
    <dc:date>2022-12-16T15:04:30Z</dc:date>
    <item>
      <title>Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19655#M13201</link>
      <description>&lt;P&gt;I am doing the "&lt;A href="https://partner-academy.databricks.com/learn/course/62/data-engineering-with-databricks-v2" alt="https://partner-academy.databricks.com/learn/course/62/data-engineering-with-databricks-v2" target="_blank"&gt;Data Engineering with Databricks V2&lt;/A&gt;" learning path.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%run ../Includes/Classroom-Setup-04.2&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Screenshot 1:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="MicrosoftTeams-image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1084i6FADA41CC9F66387/image-size/large?v=v2&amp;amp;px=999" role="button" title="MicrosoftTeams-image" alt="MicrosoftTeams-image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Inside the setup notebook, the code crashes at the following command (see screenshot 2):&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = pd.read_parquet(path = datasource_path.replace("dbfs:/", '/dbfs/'))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The error message is:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/ecommerce/raw/users-historical'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Screenshot 2:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="MicrosoftTeams-image (1)"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1088iEB1623D784E23451/image-size/large?v=v2&amp;amp;px=999" role="button" title="MicrosoftTeams-image (1)" alt="MicrosoftTeams-image (1)" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There seems to be an issue with the path, even though it actually exists:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Screenshot 3:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Capture"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1098i4C32101A252293C6/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture" alt="Capture" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I played around a little with the path specification, but nothing helped:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Screenshot 4:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Capture_2"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1092i1506C93EAB84BCC8/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture_2" alt="Capture_2" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2022 19:20:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19655#M13201</guid>
      <dc:creator>johnb1</dc:creator>
      <dc:date>2022-11-30T19:20:52Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19656#M13202</link>
      <description>&lt;P&gt;Hi @John B​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please try by removing the dbfs and starting with /mnt only.&lt;/P&gt;&lt;P&gt;Also, if this does not work, can you please upload that notebooks DBC archive, so that I would be able to check the details. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers..&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2022 19:42:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19656#M13202</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-11-30T19:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19657#M13203</link>
      <description>&lt;P&gt;Also @John B​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Assuming this is an old training course, check the same using a community cluster with DBR version less than 7. Some old training courses mount points are disabled in DBR 7+.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers...&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2022 19:54:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19657#M13203</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-11-30T19:54:46Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19658#M13204</link>
      <description>&lt;P&gt;@John B​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did your issue get resolved? &lt;/P&gt;&lt;P&gt;If not through the above methods, do ping the fix you did.&lt;/P&gt;&lt;P&gt;Cheers..&lt;/P&gt;</description>
      <pubDate>Sat, 03 Dec 2022 08:22:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19658#M13204</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-03T08:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19659#M13205</link>
      <description>&lt;P&gt;Can u try like this.spark.read.parquet("dbfs:/mnt/.......")​&lt;/P&gt;</description>
      <pubDate>Sun, 04 Dec 2022 03:14:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19659#M13205</guid>
      <dc:creator>SS2</dc:creator>
      <dc:date>2022-12-04T03:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19660#M13206</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can only use Runtime 7.3, 9.1., ..., 12.0. Minimum is 7.3. I am using DBR commnunity edition.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Br.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2022 11:35:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19660#M13206</guid>
      <dc:creator>johnb1</dc:creator>
      <dc:date>2022-12-16T11:35:43Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19661#M13207</link>
      <description>&lt;P&gt;Hi @Uma Maheswara Rao Desula​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Removing the dbfs and starting with /mnt only does not help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Capture_3"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1086i0972C9DC3470FB67/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture_3" alt="Capture_3" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Br.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2022 11:47:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19661#M13207</guid>
      <dc:creator>johnb1</dc:creator>
      <dc:date>2022-12-16T11:47:49Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19662#M13208</link>
      <description>&lt;P&gt;Hi @S S​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Reading in the file was successful. However, I got a pyspark.sql.dataframe.DataFrame object. This is not the same as a pandas DataFrame, right?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Br.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2022 11:58:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19662#M13208</guid>
      <dc:creator>johnb1</dc:creator>
      <dc:date>2022-12-16T11:58:18Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19663#M13209</link>
      <description>&lt;P&gt;@Uma Maheswara Rao Desula​&amp;nbsp;I solved the issue using ss2's suggestion (see below). After reading in a Spark DataFrame I converted it into a pandas DataFrame using the ToPandas() method.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2022 12:33:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19663#M13209</guid>
      <dc:creator>johnb1</dc:creator>
      <dc:date>2022-12-16T12:33:49Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19664#M13210</link>
      <description>&lt;P&gt;Hey @S S​&amp;nbsp; ,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can understand your issue &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;so to solve this import that DBC file and instead of question one there will be a folder for all solutions so explore solution one it will work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please upvote if you got some hint from my answer&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Aviral Bhardwaj&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2022 15:04:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19664#M13210</guid>
      <dc:creator>Aviral-Bhardwaj</dc:creator>
      <dc:date>2022-12-16T15:04:30Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19665#M13211</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;I am getting the exact issue as motioned in the first pot here. I have tried all the solutions listed: -&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Changing DBR to 7.3: Gave other errors related to libraries not present in that DBR version&lt;/LI&gt;&lt;LI&gt;Using spark.read.parquet: This is giving "&lt;I&gt;AnalysisException: Unable to infer schema for Parquet. It must be specified manually.&lt;/I&gt;" error. I have checked the parquet files exists in that location and they are not empty.&lt;/LI&gt;&lt;LI&gt;Exploring solutions folder: It is giving the same errors.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any ideas what else I can try please. &lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2023 15:11:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/19665#M13211</guid>
      <dc:creator>smkazim</dc:creator>
      <dc:date>2023-03-29T15:11:08Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/56420#M30549</link>
      <description>&lt;P&gt;I used spark.read.parquet and then convereted that to pandas dataframe and it worked for me.&lt;/P&gt;&lt;P&gt;Upvote if it helped you.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vijaykumar99535_0-1704360883621.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5702iD448EF5DDEAF4E6E/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="vijaykumar99535_0-1704360883621.png" alt="vijaykumar99535_0-1704360883621.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 09:35:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/56420#M30549</guid>
      <dc:creator>vijaykumar99535</dc:creator>
      <dc:date>2024-01-04T09:35:45Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/59060#M31326</link>
      <description>&lt;P&gt;Thanks for sharing this helped me too&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":robot_face:"&gt;🤖&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 14:34:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/59060#M31326</guid>
      <dc:creator>jonathanchcc</dc:creator>
      <dc:date>2024-02-02T14:34:03Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/76104#M35144</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="mrb_cookiebaker_0-1719585418924.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/9016i7808E8206408BC87/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="mrb_cookiebaker_0-1719585418924.png" alt="mrb_cookiebaker_0-1719585418924.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Worked for me too! Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2024 14:37:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/76104#M35144</guid>
      <dc:creator>mrb_cookiebaker</dc:creator>
      <dc:date>2024-06-28T14:37:20Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/86717#M37331</link>
      <description>&lt;P&gt;Thanks it helped.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 01:38:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/86717#M37331</guid>
      <dc:creator>Dibs</dc:creator>
      <dc:date>2024-08-30T01:38:38Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/94311#M38865</link>
      <description>&lt;P&gt;spark solution worked&lt;/P&gt;&lt;P&gt;instead of&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df = pd.read_parquet(path = datasource_path.replace("dbfs:/", '/dbfs/'))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I used&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;spark_df = spark.read.parquet(&lt;SPAN class=""&gt;"dbfs:/mnt/dbacademy-datasets/data-engineer-learning-path/v04/ecommerce/raw/users-historical/"&lt;/SPAN&gt;)&lt;BR /&gt;&lt;SPAN class=""&gt;# Convert to pandas&lt;BR /&gt;DataFrame&lt;/SPAN&gt; df = spark_df.toPandas()&lt;/P&gt;</description>
      <pubDate>Wed, 16 Oct 2024 15:35:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/94311#M38865</guid>
      <dc:creator>vigneshmayil</dc:creator>
      <dc:date>2024-10-16T15:35:13Z</dc:date>
    </item>
    <item>
      <title>Re: Problems with pandas.read_parquet() and path</title>
      <link>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/96080#M39210</link>
      <description>&lt;P&gt;Thanks for sharing bro ..It really helped.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2024 06:45:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problems-with-pandas-read-parquet-and-path/m-p/96080#M39210</guid>
      <dc:creator>hebied</dc:creator>
      <dc:date>2024-10-25T06:45:53Z</dc:date>
    </item>
  </channel>
</rss>

