cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I keep getting dataset from spark.table command (instead of dataframe)

Nastia
New Contributor III

I am trying to create a simple dlt pipeline:

 

@dlt.table
def today_latest_execution():
  return spark.sql("SELECT * FROM LIVE.last_execution")
 
@on_event_hook
def write_events_to_x(event๐Ÿ˜ž
  if (
     today_latest_execution().count() == 0
  ๐Ÿ˜ž
    try:
       ...
 
And I am getting and error:
'Dataset' object has no attribute 'count'
 
What I have tried: convertion to pandas (via ToPandas() or to_pandas_on_spark doesn't work), koalas doesn't work, using diff functions (not spark.sql) doesn't work... I am stuck ๐Ÿ˜ž
How to make my function return me dataframe instead of dataset?
1 ACCEPTED SOLUTION

Accepted Solutions

jacovangelder
Honored Contributor

I only just noticed you are using DLT. My bad.

The @Dlt.table decorator tells DLT to create a table that contains the result of a DataFrame

Basically, you can't operate on the result of the function as you're used to operating on a DataFrame, but you need to operate on the DLT table it created, using dlt.read(<table_name>). If you want to do DataFrame operations on the table you've created, you need to use dlt.read(<table_name>).count()

Example:

 

@Dlt.table
def test():
  if dlt.read("today_latest_execution").count() >= 0:
    return dlt.read("today_latest_execution")

 

DLT works a lot differently than what you're used to with working with function return values.

Hope this helps! 

Edit: argh, somehow my post keeps tagging user Dlt haha but I think you get the point! 

View solution in original post

7 REPLIES 7

-werners-
Esteemed Contributor III

can you try count() instead of count (without brackets)?

PS. a dataframe is a dataset of type row.

jacovangelder
Honored Contributor

You're missing the parenthesis: count()

Nastia
New Contributor III

@jacovangelder @-werners- , yes yes, it has () there, sorry, copied the code wrongly 

error is still the same though ๐Ÿ˜ž

jacovangelder
Honored Contributor

I only just noticed you are using DLT. My bad.

The @Dlt.table decorator tells DLT to create a table that contains the result of a DataFrame

Basically, you can't operate on the result of the function as you're used to operating on a DataFrame, but you need to operate on the DLT table it created, using dlt.read(<table_name>). If you want to do DataFrame operations on the table you've created, you need to use dlt.read(<table_name>).count()

Example:

 

@Dlt.table
def test():
  if dlt.read("today_latest_execution").count() >= 0:
    return dlt.read("today_latest_execution")

 

DLT works a lot differently than what you're used to with working with function return values.

Hope this helps! 

Edit: argh, somehow my post keeps tagging user Dlt haha but I think you get the point! 

-werners-
Esteemed Contributor III

glad I work in scala and do no have to deal with DLT ๐Ÿ˜„

Not a fan myself either! It seems DLT is getting a big rebrand with LakeFlow around the corner. In my experience DLT was never that widely adopted. 

-werners-
Esteemed Contributor III

what if you do:
return spark.sql("SELECT * FROM LIVE.last_execution").toDF()

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group