<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read file from dbfs with pd.read_csv() using databricks-connect in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16378#M10572</link>
    <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;After some research, I have found out that the pandas API reads only local files. This means that even if a read_csv command works in the Databricks Notebook environment, it will not work when using databricks-connect (pandas reads locally from within the notebook environment). &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;A work around is to use the pyspark spark.read.format('csv') API to read the remote files and append a ".toPandas()" at the end so that we get a pandas dataframe. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df_pandas = spark.read.format('csv').options(header='true').load('path/in/the/remote/dbfs/filesystem/').toPandas()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 25 Nov 2021 08:18:03 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2021-11-25T08:18:03Z</dc:date>
    <item>
      <title>Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16359#M10553</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hello all,&lt;/P&gt;
&lt;P&gt;As described in the title, here's my problem:&lt;/P&gt;
&lt;P&gt;1. I'm using databricks-connect in order to send jobs to a databricks cluster&lt;/P&gt;
&lt;P&gt;2. The "local" environment is an AWS EC2&lt;/P&gt;
&lt;P&gt;3. I want to read a CSV file that is in DBFS (databricks) with &lt;PRE&gt;&lt;CODE&gt;pd.read_csv()&lt;/CODE&gt;&lt;/PRE&gt;. Reason for that is that it's too big to do &lt;PRE&gt;&lt;CODE&gt;spark.read.csv()&lt;/CODE&gt;&lt;/PRE&gt; and then &lt;PRE&gt;&lt;CODE&gt;.toPandas()&lt;/CODE&gt;&lt;/PRE&gt; (crashes everytime).&lt;/P&gt;
&lt;P&gt;4. When I run &lt;PRE&gt;&lt;CODE&gt;pd.read_csv("/dbfs/FileStore/some_file")&lt;/CODE&gt;&lt;/PRE&gt; I get a &lt;PRE&gt;&lt;CODE&gt;FileNotFoundError&lt;/CODE&gt;&lt;/PRE&gt; because it points to the local S3 buckets rather than to dbfs. Is there a way to do what I want to do (e.g. change where pandas looks for files with some options)?&lt;/P&gt;
&lt;P&gt;Thanks a lot in advance!&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Aug 2021 16:11:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16359#M10553</guid>
      <dc:creator>hamzatazib96</dc:creator>
      <dc:date>2021-08-18T16:11:46Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16361#M10555</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;what happens if you change it to below ?&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;pd.read_csv("file:/dbfs/FileStore/some_file")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Sep 2021 11:09:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16361#M10555</guid>
      <dc:creator>User16763506586</dc:creator>
      <dc:date>2021-09-29T11:09:16Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16362#M10556</link>
      <description>&lt;P&gt;I am having a similar issue:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I am running databricks-connect from within a docker container&lt;/LI&gt;&lt;LI&gt;I have a .xls file stored in Azure File storage, which is mounted to dbfs&lt;/LI&gt;&lt;LI&gt;I would like to read this excel file with &lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;pd.read_excel("dbfs:/mnt/path/to/file.xls")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Has a solution been found for this?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 09:38:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16362#M10556</guid>
      <dc:creator>venter2021</dc:creator>
      <dc:date>2021-10-28T09:38:01Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16363#M10557</link>
      <description>&lt;P&gt;Trying it with pd.read_excel does not help. &lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 09:38:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16363#M10557</guid>
      <dc:creator>venter2021</dc:creator>
      <dc:date>2021-10-28T09:38:29Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16365#M10559</link>
      <description>&lt;P&gt;I've tried, which doesn't work.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 01:42:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16365#M10559</guid>
      <dc:creator>Student185</dc:creator>
      <dc:date>2021-11-24T01:42:02Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16367#M10561</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​&amp;nbsp;, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am having similar issues when using databricks-connect with Azure. I am not able to read data that is already mounted to dbfs (from a datalake gen2). The data is readable within the Azure Databricks Notebook environment but not from databricks-connect. &lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 14:54:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16367#M10561</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-24T14:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16369#M10563</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;My DBR: &lt;/P&gt;&lt;P&gt;9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 14:58:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16369#M10563</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-24T14:58:49Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16370#M10564</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp;, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;All tests in databricks-connect pass. I am also able to run the examples provided in the documentation (which do not read data from dbfs)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 15:03:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16370#M10564</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-24T15:03:40Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16372#M10566</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;No, I still haven't found the solution and I can't read from dbfs​ (not with pandas.read_csv). &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;I meant to say that the setup tests pass, so the issue is not in the setup) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 15:49:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16372#M10566</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-24T15:49:15Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16374#M10568</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;I will try that and report!​&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 15:58:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16374#M10568</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-24T15:58:38Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16376#M10570</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can confirm that after downgrading to the DBR 6.4, and passing all the tests in:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;databricks-connect test&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am still getting the FileNotFound error when trying to use &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;pd.read_csv('/dbfs/mnt/datalake_gen2_data/some.csv'')&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 16:25:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16376#M10570</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-24T16:25:59Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16377#M10571</link>
      <description>&lt;P&gt;Hi Fatma, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for asking. &lt;/P&gt;&lt;P&gt;I've tried 10.1 ML (includes Apache Spark 3.2.0, Scala 2.12) and  9.1 LTS (Scala 2.12, Spark 3.1.2) . Both of them don't work.&lt;/P&gt;&lt;P&gt;However, it works while I read it via spark. And I used display(dbutils.fs.ls("dbfs:/FileStore/tables/")) to test it, my file path(dbfs:/FileStore/tables/POS_CASH_balance.csv) exists.  So I don't think it is the problem of the path or my code of pandas. I personally guess that the free version didn't support reading csv/files from dbfs via pandas directly, isn't it?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is the change of my code, and the change works&lt;/P&gt;&lt;P&gt;pd.read_csv('dbfs:/FileStore/tables/POS_CASH_balance.csv')--&amp;gt;spark.read.csv('dbfs:/FileStore/tables/POS_CASH_balance.csv)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope my experience could help others.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 19:39:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16377#M10571</guid>
      <dc:creator>Student185</dc:creator>
      <dc:date>2021-11-24T19:39:52Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16378#M10572</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;After some research, I have found out that the pandas API reads only local files. This means that even if a read_csv command works in the Databricks Notebook environment, it will not work when using databricks-connect (pandas reads locally from within the notebook environment). &lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;A work around is to use the pyspark spark.read.format('csv') API to read the remote files and append a ".toPandas()" at the end so that we get a pandas dataframe. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df_pandas = spark.read.format('csv').options(header='true').load('path/in/the/remote/dbfs/filesystem/').toPandas()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Nov 2021 08:18:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16378#M10572</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-25T08:18:03Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16379#M10573</link>
      <description>&lt;P&gt;Hi Arturooa,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It seems we are holding a similar conclusion. Just a quick question, what do you mean for 'local files'? I've uploaded my files into dbfs, are they not local files after that?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 25 Nov 2021 18:09:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16379#M10573</guid>
      <dc:creator>Student185</dc:creator>
      <dc:date>2021-11-25T18:09:06Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16380#M10574</link>
      <description>&lt;P&gt;Hi @Yuanyue Liu​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;The spark engine is connected to the (remote) workers on Databricks, this is the reason why you can read the data from the dbfs by use of:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.read.format('csv').options(header='true').load('path/in/the/remote/dbfs/filesystem/')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The same happens with dbutils, for example. You can read files in the dbfs with for example:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dbutils.fs.ls(files_path)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Pandas does not connect directly to the remote filesystem (dbfs). That is the reason why you have to first read the remote data with spark and then transform to an in-memory dataframe (pandas). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using pandas profiling and after I make an HTML report, which is written to the local driver (since pandas_profiling does not connect to the remote filesystem either), I use dbutils to upload data to my mnt drive in dbfs (that comes from a datalake gen2). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope this helps! &lt;/P&gt;</description>
      <pubDate>Fri, 26 Nov 2021 11:40:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16380#M10574</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-26T11:40:44Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16381#M10575</link>
      <description>&lt;P&gt;@Arturo Amador​&amp;nbsp;- Would you be happy to mark your answer as best if the issue has been resolved by what you found? That will help others find your answer more quickly in the future.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Dec 2021 17:10:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16381#M10575</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-06T17:10:50Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16382#M10576</link>
      <description>&lt;P&gt;Hi,  @Piper Wilson​&amp;nbsp;, &lt;/P&gt;&lt;P&gt;it is actually @hamzatazib96​&amp;nbsp; that needs to mark the answer as best &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2021 10:14:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16382#M10576</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-15T10:14:09Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16383#M10577</link>
      <description>&lt;P&gt;WHOOPS! Thank you, @Arturo Amador​!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@hamzatazib96​&amp;nbsp;- If any of the answers solved the issue, would you be happy to mark it as best? &lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2021 18:47:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16383#M10577</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-15T18:47:24Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16384#M10578</link>
      <description>&lt;P&gt;Done! Thanks all for the answers and help!&lt;/P&gt;&lt;P&gt;Best way I found around this was to simply do an SCP transfer using the databricks exe from DBFS to an S3 bucket. The flow was:&lt;/P&gt;&lt;P&gt;DBFS -&amp;gt; EC2 Local -&amp;gt; S3 bucket&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2021 19:42:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16384#M10578</guid>
      <dc:creator>hamzatazib96</dc:creator>
      <dc:date>2021-12-15T19:42:58Z</dc:date>
    </item>
    <item>
      <title>Re: Read file from dbfs with pd.read_csv() using databricks-connect</title>
      <link>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16385#M10579</link>
      <description>&lt;P&gt;DataBricks community edition 10.4 LTS ML (Apache Spark 3.2.1, Scala 2.12) has the same problem with &lt;I&gt;pd.read_csv.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The &lt;I&gt;spark.read&lt;/I&gt; statement replaces the original column names with (_c0, _c1,…), unless &lt;I&gt;.option("header",&amp;nbsp;true")&lt;/I&gt; is used.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The following forms should work:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;path = 'dbfs:/FileStore/tables/POS_CASH_balance.csv'&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;spark.read
.option("header", "true")
.csv(path)&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;spark.read
.format("csv")
.option("header", "true")
.load(file_name)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2023 21:04:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-file-from-dbfs-with-pd-read-csv-using-databricks-connect/m-p/16385#M10579</guid>
      <dc:creator>martud</dc:creator>
      <dc:date>2023-01-04T21:04:13Z</dc:date>
    </item>
  </channel>
</rss>

