<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I keep getting dataset from spark.table command (instead of dataframe) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74952#M34826</link>
    <description>&lt;P&gt;I only just noticed you are using DLT. My bad.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;decorator tells DLT to create a table that contains the result of a DataFrame&lt;/SPAN&gt;&lt;SPAN&gt;.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Basically, you can't operate on the result of the function as you're used to operating on a DataFrame, but you need to operate on the DLT table it created, using&amp;nbsp;&lt;SPAN class=""&gt;dlt.read(&amp;lt;table_name&amp;gt;)&lt;/SPAN&gt;. If you want to do DataFrame operations on the table you've created, you need to use&amp;nbsp;&lt;SPAN class=""&gt;dlt.read(&amp;lt;table_name&amp;gt;).count()&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@Dlt.table
def test():
  if dlt.read("today_latest_execution").count() &amp;gt;= 0:
    return dlt.read("today_latest_execution")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;DLT works a lot differently than what you're used to with working with function return values.&lt;/P&gt;&lt;P&gt;Hope this helps!&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Edit: argh, somehow my post keeps tagging user Dlt haha but I think you get the point!&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Jun 2024 09:37:59 GMT</pubDate>
    <dc:creator>jacovangelder</dc:creator>
    <dc:date>2024-06-19T09:37:59Z</dc:date>
    <item>
      <title>I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74871#M34801</link>
      <description>&lt;P&gt;I am trying to create a simple dlt pipeline:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;today_latest_execution&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;FROM&lt;/SPAN&gt; &lt;SPAN&gt;LIVE&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;last_execution&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;on_event_hook&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;write_events_to_x&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;event&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;today_latest_execution&lt;/SPAN&gt;&lt;SPAN&gt;().count()&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;==&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;try&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;...&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;And I am getting and error:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;'Dataset' object has no attribute 'count'&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;What I have tried: convertion to pandas (via ToPandas() or to_pandas_on_spark doesn't work), koalas doesn't work, using diff functions (not spark.sql) doesn't work... I am stuck &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;How to make my function return me dataframe instead of dataset?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 19 Jun 2024 08:26:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74871#M34801</guid>
      <dc:creator>Nastia</dc:creator>
      <dc:date>2024-06-19T08:26:49Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74874#M34803</link>
      <description>&lt;P&gt;can you try count() instead of count (without brackets)?&lt;/P&gt;&lt;P&gt;PS. a dataframe is a dataset of type row.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jun 2024 14:10:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74874#M34803</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-06-18T14:10:31Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74905#M34810</link>
      <description>&lt;P&gt;You're missing the parenthesis: count&lt;STRONG&gt;()&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jun 2024 18:01:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74905#M34810</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-06-18T18:01:41Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74939#M34819</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102253"&gt;@jacovangelder&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/14792"&gt;@-werners-&lt;/a&gt;&amp;nbsp;, yes yes, it has () there, sorry, copied the code wrongly&amp;nbsp;&lt;/P&gt;&lt;P&gt;error is still the same though &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2024 08:28:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74939#M34819</guid>
      <dc:creator>Nastia</dc:creator>
      <dc:date>2024-06-19T08:28:19Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74944#M34823</link>
      <description>&lt;P&gt;what if you do:&lt;BR /&gt;return spark.sql("SELECT * FROM LIVE.last_execution")&lt;STRONG&gt;.toDF()&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2024 09:14:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74944#M34823</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-06-19T09:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74952#M34826</link>
      <description>&lt;P&gt;I only just noticed you are using DLT. My bad.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;decorator tells DLT to create a table that contains the result of a DataFrame&lt;/SPAN&gt;&lt;SPAN&gt;.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Basically, you can't operate on the result of the function as you're used to operating on a DataFrame, but you need to operate on the DLT table it created, using&amp;nbsp;&lt;SPAN class=""&gt;dlt.read(&amp;lt;table_name&amp;gt;)&lt;/SPAN&gt;. If you want to do DataFrame operations on the table you've created, you need to use&amp;nbsp;&lt;SPAN class=""&gt;dlt.read(&amp;lt;table_name&amp;gt;).count()&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@Dlt.table
def test():
  if dlt.read("today_latest_execution").count() &amp;gt;= 0:
    return dlt.read("today_latest_execution")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;DLT works a lot differently than what you're used to with working with function return values.&lt;/P&gt;&lt;P&gt;Hope this helps!&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Edit: argh, somehow my post keeps tagging user Dlt haha but I think you get the point!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2024 09:37:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74952#M34826</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-06-19T09:37:59Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74954#M34827</link>
      <description>&lt;P&gt;glad I work in scala and do no have to deal with DLT &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2024 09:42:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74954#M34827</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-06-19T09:42:03Z</dc:date>
    </item>
    <item>
      <title>Re: I keep getting dataset from spark.table command (instead of dataframe)</title>
      <link>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74955#M34828</link>
      <description>&lt;P&gt;Not a fan myself either! It seems DLT is getting a big rebrand with LakeFlow around the corner. In my experience DLT was never that widely adopted.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jun 2024 09:44:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-keep-getting-dataset-from-spark-table-command-instead-of/m-p/74955#M34828</guid>
      <dc:creator>jacovangelder</dc:creator>
      <dc:date>2024-06-19T09:44:39Z</dc:date>
    </item>
  </channel>
</rss>

