<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to create a dataframe with the files from S3 bucket in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27814#M19662</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have already checked this... still not able to see data. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;df = spark.read.text("mnt/S3_Connection/Details.csv")
&lt;P&gt;Still I don't see data. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 19 Sep 2019 07:43:15 GMT</pubDate>
    <dc:creator>akj2784</dc:creator>
    <dc:date>2019-09-19T07:43:15Z</dc:date>
    <item>
      <title>How to create a dataframe with the files from S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27812#M19660</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have connected my S3 bucket from databricks.&lt;/P&gt;
&lt;P&gt;Using the following command :&lt;/P&gt;
&lt;P&gt;import urllib&lt;/P&gt;
&lt;P&gt;import urllib.parse &lt;/P&gt;
&lt;P&gt;ACCESS_KEY = "Test" &lt;/P&gt;
&lt;P&gt;SECRET_KEY = "Test" &lt;/P&gt;
&lt;P&gt;ENCODED_SECRET_KEY = urllib.parse.quote(SECRET_KEY, "") AWS_BUCKET_NAME = "Test" MOUNT_NAME = "S3_Connection_details" dbutils.fs.mount("s3n://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY,AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)&lt;/P&gt;
&lt;P&gt;Now when I run the below command, I get the list of csv files present in the bucket.&lt;/P&gt;
&lt;P&gt;display(dbutils.fs.ls("/mnt/S3_Connection"))&lt;/P&gt;
&lt;P&gt;If there are 10 files, I want to create 10 different tables in postgreSQL after reading the csv files. I don't need any transformation. Is it feasible ?&lt;/P&gt;
&lt;P&gt;First of all how to create a dataframe using one of the csv file. If anyone can help me with the syntax.&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Akash&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 07:05:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27812#M19660</guid>
      <dc:creator>akj2784</dc:creator>
      <dc:date>2019-09-19T07:05:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to create a dataframe with the files from S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27813#M19661</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi @akj2784,&lt;/P&gt;&lt;P&gt;Please go through Databricks documentation on working with files in S3,&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/spark/latest/data-sources/aws/amazon-s3.html#mount-s3-buckets-with-dbfs" target="_blank"&gt;https://docs.databricks.com/spark/latest/data-sources/aws/amazon-s3.html#mount-s3-buckets-with-dbfs&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 07:13:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27813#M19661</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-09-19T07:13:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to create a dataframe with the files from S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27814#M19662</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have already checked this... still not able to see data. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;df = spark.read.text("mnt/S3_Connection/Details.csv")
&lt;P&gt;Still I don't see data. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 07:43:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27814#M19662</guid>
      <dc:creator>akj2784</dc:creator>
      <dc:date>2019-09-19T07:43:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to create a dataframe with the files from S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27815#M19663</link>
      <description>&lt;P&gt;Try to read using below methods,&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = spark.read.text("/mnt/%s/...." % MOUNT_NAME)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;and&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = sc.textFile("s3a://%s:%s@%s/.../..." % ACCESS_KEY, ENCODED_SECRET_KEY, BUCKET_NAME)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 07:53:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27815#M19663</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-09-19T07:53:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to create a dataframe with the files from S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27816#M19664</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am able to create dataframe but when I do df.head(), I see only the columns names. However I want to see the data as well.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 08:15:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27816#M19664</guid>
      <dc:creator>akj2784</dc:creator>
      <dc:date>2019-09-19T08:15:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to create a dataframe with the files from S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27817#M19665</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Please take a look at the documentation. df.head() will show the first 1 row by default, but you can add an integer as a parameter to show additional: &lt;A href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.head" target="test_blank"&gt;https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.head&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Please look at other display methods such as df.show() or the custom databricks method display(df)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2019 15:14:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-dataframe-with-the-files-from-s3-bucket/m-p/27817#M19665</guid>
      <dc:creator>lee</dc:creator>
      <dc:date>2019-09-19T15:14:55Z</dc:date>
    </item>
  </channel>
</rss>

