<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: spark/databricks temporary views and uuid in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-databricks-temporary-views-and-uuid/m-p/69966#M33946</link>
    <description>&lt;P&gt;Thanks,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp; I suspected that, but could not find any links for confirming it.&lt;/P&gt;</description>
    <pubDate>Mon, 20 May 2024 12:03:44 GMT</pubDate>
    <dc:creator>shadowinc</dc:creator>
    <dc:date>2024-05-20T12:03:44Z</dc:date>
    <item>
      <title>spark/databricks temporary views and uuid</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-databricks-temporary-views-and-uuid/m-p/69709#M33908</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;We have a table which has an id column generated by uuid(). For ETL we use databricks/spark sql temporary views. we observed strange behavior between databricks sql temp view (&lt;EM&gt;create or replace temporary view&lt;/EM&gt;) and spark sql temp view (&lt;EM&gt;df.createorreplacetempview()&lt;/EM&gt;).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;spark sql&lt;/STRONG&gt; -&amp;nbsp; uuid() was evaluated every time and if joined by another table result was weird, uuid generated for 1 primary key column was asscoiated to another, somehow resulting in duplicates uuid()&lt;/P&gt;&lt;P&gt;df= select *, uuid() as id from source_table&lt;/P&gt;&lt;P&gt;df.createorreplacetempview(readData&lt;EM&gt; )&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;df= select * from&amp;nbsp;readData join target_table on primary_key&lt;/P&gt;&lt;P&gt;df.createorreplacetempview(mergePrep&amp;nbsp; &lt;EM&gt;)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;databricks SQL&lt;/STRONG&gt; -&amp;nbsp; when using this and the same process, uuid() once generated were fixed and after joining also everything was fine.&lt;BR /&gt;readData = create or replace temp view readData&amp;nbsp; as select *, uuid() as id from source_table&lt;BR /&gt;mergePrep&amp;nbsp; = create or replace temp view mergePrep&amp;nbsp; as select * from&amp;nbsp;readData join target_table on primary_key&lt;BR /&gt;&lt;BR /&gt;Using databricks sql resolves my issue, however, I want to know how 2 approaches differ from each other while performing same operations. From my research I found that spark SQL df evaluates every time we use select, does that means even after creating temp view it evaluates (underlying nondeterministic functions like uuid), and same doesn't happen when using the databricks SQL method?&lt;BR /&gt;&lt;BR /&gt;Appreciate your support on this. Point me to the right resources. Thanks&lt;/P&gt;</description>
      <pubDate>Sat, 18 May 2024 15:21:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-databricks-temporary-views-and-uuid/m-p/69709#M33908</guid>
      <dc:creator>shadowinc</dc:creator>
      <dc:date>2024-05-18T15:21:16Z</dc:date>
    </item>
    <item>
      <title>Re: spark/databricks temporary views and uuid</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-databricks-temporary-views-and-uuid/m-p/69966#M33946</link>
      <description>&lt;P&gt;Thanks,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp; I suspected that, but could not find any links for confirming it.&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2024 12:03:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-databricks-temporary-views-and-uuid/m-p/69966#M33946</guid>
      <dc:creator>shadowinc</dc:creator>
      <dc:date>2024-05-20T12:03:44Z</dc:date>
    </item>
  </channel>
</rss>

