<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: create a dataframe with all the responses from the api requests within foreachPartition in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26177#M18290</link>
    <description>&lt;P&gt;It can be achieved using mapPartitions&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;val df_response = df.mapPartitions(iterator =&amp;gt; {
  val api_connect  = new s3clientBuild()
  val s3client = api_connect.s3connection(AccessKey, SecretKey)
  val res = iterator.map(row =&amp;gt;{
    val name = getS3(row.getString(0), s3client)
    (name)
  })
   res
  }).toDF("value")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 08 Mar 2022 21:39:59 GMT</pubDate>
    <dc:creator>Sandesh87</dc:creator>
    <dc:date>2022-03-08T21:39:59Z</dc:date>
    <item>
      <title>create a dataframe with all the responses from the api requests within foreachPartition</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26176#M18289</link>
      <description>&lt;P&gt;I am trying to execute an api call to get an object(json) from amazon s3 and I am using foreachPartition to execute multiple calls in parallel&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df.rdd.foreachPartition(partition =&amp;gt; {
  //Initialize list buffer
  var buffer_accounts1 = new ListBuffer[String]()
&amp;nbsp;
  //Initialize Connection to amazon s3
  val s3 = s3clientConnection()
&amp;nbsp;
  partition.foreach(fun=&amp;gt;{
   //api to get object from s3 bucket
   //the first column of each row contains s3 object name
    val obj = getS3Object(s3, "my_bucket", fun.getString(0)).getContent
    val objString = IOUtils.toString(obj, "UTF-8")
    buffer_accounts1 += objString 
  })
  buffer_accounts1.toList.toDF("Object").write.parquet("dbfs:/mnt/test")
 })&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;From the foreachPartition I would like to store the string responses from all of the api calls into a single dataframe. So if in my forEachPartition if I make a total of 100 api calls I would like to create one dataframe that has all the 100 responses. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To do this I am creating a mutable list and want to convert it to a dataframe within foreachPartition&amp;nbsp; but we cannot create a dataframe outside of the driver.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to create a dataframe with all the responses from the total api calls within the foreachPartition so that I can apply further transformations. How can this be achieved?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note:- I could write every response to disk as json and read them back in but that results in performance degradation because of a lot of disk I/O operations.&lt;/P&gt;</description>
      <pubDate>Tue, 08 Mar 2022 17:53:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26176#M18289</guid>
      <dc:creator>Sandesh87</dc:creator>
      <dc:date>2022-03-08T17:53:56Z</dc:date>
    </item>
    <item>
      <title>Re: create a dataframe with all the responses from the api requests within foreachPartition</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26177#M18290</link>
      <description>&lt;P&gt;It can be achieved using mapPartitions&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;val df_response = df.mapPartitions(iterator =&amp;gt; {
  val api_connect  = new s3clientBuild()
  val s3client = api_connect.s3connection(AccessKey, SecretKey)
  val res = iterator.map(row =&amp;gt;{
    val name = getS3(row.getString(0), s3client)
    (name)
  })
   res
  }).toDF("value")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Mar 2022 21:39:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26177#M18290</guid>
      <dc:creator>Sandesh87</dc:creator>
      <dc:date>2022-03-08T21:39:59Z</dc:date>
    </item>
    <item>
      <title>Re: create a dataframe with all the responses from the api requests within foreachPartition</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26178#M18291</link>
      <description>&lt;P&gt;Hi @Sandesh Puligundla​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for sharing the solution. We will mark it as "best" response so, in the future is another user has the same question, they will be able to find the solution right away.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2022 21:08:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-dataframe-with-all-the-responses-from-the-api-requests/m-p/26178#M18291</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-04-11T21:08:20Z</dc:date>
    </item>
  </channel>
</rss>

