<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Load multiple csv files into a dataframe in order in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28430#M20226</link>
    <description>&lt;PRE&gt;&lt;CODE&gt;val diamonds = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/11.csv","/FileStore/tables/12.csv","/FileStore/tables/13.csv")
&amp;nbsp;
display(diamonds)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This is working for me @Shridhar​&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 12 Jan 2022 12:43:10 GMT</pubDate>
    <dc:creator>Jaswanth_Saniko</dc:creator>
    <dc:date>2022-01-12T12:43:10Z</dc:date>
    <item>
      <title>Load multiple csv files into a dataframe in order</title>
      <link>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28428#M20224</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt; I can load multiple csv files by doing something like:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;paths = ["file_1", "file_2", "file_3"]
df = sqlContext.read
       .format("com.databricks.spark.csv")
       .option("header", "true")
       .load(paths)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But this doesn't seem to preserve the order in |paths|. &lt;/P&gt;
&lt;P&gt;In particular, I'm trying to have a monotonically increasing id that spans the data in all files.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Oct 2018 01:24:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28428#M20224</guid>
      <dc:creator>Shridhar</dc:creator>
      <dc:date>2018-10-18T01:24:35Z</dc:date>
    </item>
    <item>
      <title>Re: Load multiple csv files into a dataframe in order</title>
      <link>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28429#M20225</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@shridhar have you found out an alternative for achieving this. I also have the same problem.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Nov 2019 03:50:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28429#M20225</guid>
      <dc:creator>JayaKommuru</dc:creator>
      <dc:date>2019-11-20T03:50:40Z</dc:date>
    </item>
    <item>
      <title>Re: Load multiple csv files into a dataframe in order</title>
      <link>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28430#M20226</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;val diamonds = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/11.csv","/FileStore/tables/12.csv","/FileStore/tables/13.csv")
&amp;nbsp;
display(diamonds)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This is working for me @Shridhar​&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jan 2022 12:43:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/load-multiple-csv-files-into-a-dataframe-in-order/m-p/28430#M20226</guid>
      <dc:creator>Jaswanth_Saniko</dc:creator>
      <dc:date>2022-01-12T12:43:10Z</dc:date>
    </item>
  </channel>
</rss>

