<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Create a in-memory table in Spark and insert data into it in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29736#M21443</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Vida,&lt;/P&gt;
&lt;P&gt;Thank you very much for your help&lt;/P&gt;
&lt;P&gt;That works good but problem is I have to insert data from multiple queries. I have to declare collection of dataframes to store data from each query at the end I can union all dataframes and insert into an Hive table.&lt;/P&gt;
&lt;P&gt;I tried to create collection of dataframe in scala , I am new to scala still struggling &lt;/P&gt;
&lt;P&gt;Please let me know the syntax for declaring collection/array of dataframes&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;~Sri&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 12 Apr 2016 14:21:18 GMT</pubDate>
    <dc:creator>Sri1</dc:creator>
    <dc:date>2016-04-12T14:21:18Z</dc:date>
    <item>
      <title>Create a in-memory table in Spark and insert data into it</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29732#M21439</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;My requirement is I need to create a Spark In-memory table (Not pushing hive table into memory) insert data into it and finally write that back to Hive table.&lt;/P&gt;
&lt;P&gt;Idea here is to avoid the disk IO while writing into Target Hive table. There are lot of insert statements but I want to write that back to Hive table only after all execution is over.&lt;/P&gt;
&lt;P&gt;Could you please let me know if that is possible. Please let me know if you have better solution&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;~Sri&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Apr 2016 16:57:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29732#M21439</guid>
      <dc:creator>Sri1</dc:creator>
      <dc:date>2016-04-08T16:57:39Z</dc:date>
    </item>
    <item>
      <title>Re: Create a in-memory table in Spark and insert data into it</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29733#M21440</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;1) Use sc.parallelize to create the table.&lt;/P&gt;
&lt;P&gt;2) Register just a temporary table.&lt;/P&gt;
&lt;P&gt;3) You can keep adding insert statements into this table. Note that Spark SQL supports inserting from other tables. So again, you might need to create temporary tables to insert into the first temporary table.&lt;/P&gt;
&lt;P&gt;This table should not write out to disk until you run a "saveAsTable" or other type of command.&lt;/P&gt;
&lt;P&gt;-Vida&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Apr 2016 17:02:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29733#M21440</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2016-04-08T17:02:24Z</dc:date>
    </item>
    <item>
      <title>Re: Create a in-memory table in Spark and insert data into it</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29734#M21441</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Vida,&lt;/P&gt;
&lt;P&gt;Sorry for the late reply.&lt;/P&gt;
&lt;P&gt;I tried creating 2 similar temp tables in spark based on a Hive table, 1 had data other one is empty&lt;/P&gt;
&lt;P&gt;when I try to insert into empty table I get the below error&lt;/P&gt;
&lt;P&gt;&lt;B&gt;org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed.;&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;Please correct me if there are any issues with this approach. I tried exactly same as you said except I created the temp tables out of a Hive table&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;~Sri&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2016 03:27:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29734#M21441</guid>
      <dc:creator>Sri1</dc:creator>
      <dc:date>2016-04-12T03:27:39Z</dc:date>
    </item>
    <item>
      <title>Re: Create a in-memory table in Spark and insert data into it</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29735#M21442</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Got it - how about using a UnionAll? I believe this code snippet does what you'd want:&lt;/P&gt;from pyspark.sql import Row
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;array = [Row(value=1), Row(value=2), Row(value=3)] df = sqlContext.createDataFrame(sc.parallelize(array))&lt;/P&gt; 
&lt;P&gt;array2 = [Row(value=4), Row(value=5), Row(value=6)] df2 = sqlContext.createDataFrame(sc.parallelize(array2))&lt;/P&gt; 
&lt;P&gt;two_tables = df.unionAll(df2) two_tables.collect()&lt;/P&gt; 
&lt;P&gt;&amp;gt;&amp;gt; Out[17]: [Row(value=1), Row(value=2), Row(value=3), Row(value=4), Row(value=5), Row(value=6)] &lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2016 13:31:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29735#M21442</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2016-04-12T13:31:08Z</dc:date>
    </item>
    <item>
      <title>Re: Create a in-memory table in Spark and insert data into it</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29736#M21443</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Vida,&lt;/P&gt;
&lt;P&gt;Thank you very much for your help&lt;/P&gt;
&lt;P&gt;That works good but problem is I have to insert data from multiple queries. I have to declare collection of dataframes to store data from each query at the end I can union all dataframes and insert into an Hive table.&lt;/P&gt;
&lt;P&gt;I tried to create collection of dataframe in scala , I am new to scala still struggling &lt;/P&gt;
&lt;P&gt;Please let me know the syntax for declaring collection/array of dataframes&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;~Sri&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2016 14:21:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29736#M21443</guid>
      <dc:creator>Sri1</dc:creator>
      <dc:date>2016-04-12T14:21:18Z</dc:date>
    </item>
    <item>
      <title>Re: Create a in-memory table in Spark and insert data into it</title>
      <link>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29737#M21444</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Sri,&lt;/P&gt;
&lt;P&gt;It's probably worth your time going through some DataFrame tutorials. Here's a good one from us on the basics of DataFrames. This material should help you get a sense for how you might create a collection of DataFrames and learn a bit more of the scala nuances!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2016 15:49:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-a-in-memory-table-in-spark-and-insert-data-into-it/m-p/29737#M21444</guid>
      <dc:creator>Bill_Chambers</dc:creator>
      <dc:date>2016-04-12T15:49:43Z</dc:date>
    </item>
  </channel>
</rss>

