06-18-2024 06:50 AM - edited 06-19-2024 01:26 AM
I am trying to create a simple dlt pipeline:
06-19-2024 02:36 AM - edited 06-19-2024 02:37 AM
I only just noticed you are using DLT. My bad.
The @Dlt.table decorator tells DLT to create a table that contains the result of a DataFrame.
Basically, you can't operate on the result of the function as you're used to operating on a DataFrame, but you need to operate on the DLT table it created, using dlt.read(<table_name>). If you want to do DataFrame operations on the table you've created, you need to use dlt.read(<table_name>).count()
Example:
@Dlt.table
def test():
if dlt.read("today_latest_execution").count() >= 0:
return dlt.read("today_latest_execution")
DLT works a lot differently than what you're used to with working with function return values.
Hope this helps!
Edit: argh, somehow my post keeps tagging user Dlt haha but I think you get the point!
06-18-2024 07:10 AM
can you try count() instead of count (without brackets)?
PS. a dataframe is a dataset of type row.
06-18-2024 11:01 AM
You're missing the parenthesis: count()
06-19-2024 01:27 AM - edited 06-19-2024 01:28 AM
@jacovangelder @-werners- , yes yes, it has () there, sorry, copied the code wrongly
error is still the same though 😞
06-19-2024 02:36 AM - edited 06-19-2024 02:37 AM
I only just noticed you are using DLT. My bad.
The @Dlt.table decorator tells DLT to create a table that contains the result of a DataFrame.
Basically, you can't operate on the result of the function as you're used to operating on a DataFrame, but you need to operate on the DLT table it created, using dlt.read(<table_name>). If you want to do DataFrame operations on the table you've created, you need to use dlt.read(<table_name>).count()
Example:
@Dlt.table
def test():
if dlt.read("today_latest_execution").count() >= 0:
return dlt.read("today_latest_execution")
DLT works a lot differently than what you're used to with working with function return values.
Hope this helps!
Edit: argh, somehow my post keeps tagging user Dlt haha but I think you get the point!
06-19-2024 02:42 AM
glad I work in scala and do no have to deal with DLT 😄
06-19-2024 02:44 AM
Not a fan myself either! It seems DLT is getting a big rebrand with LakeFlow around the corner. In my experience DLT was never that widely adopted.
06-19-2024 02:14 AM
what if you do:
return spark.sql("SELECT * FROM LIVE.last_execution").toDF()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group