cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

GroupBy in delta live tables fails with error "RuntimeError: Query function must return either a Spark or Koalas DataFrame"

PrebenOlsen
New Contributor III

I have a delta live table that I'm trying to run GroupBy on, but getting an error: "RuntimeError: Query function must return either a Spark or Koalas DataFrame".

Here is my code:

@dlt.table
def groups_hierarchy():
 
     df = dlt.read_stream("groups_hierarchy_vw")
     return(df
        .select("id","name",split("path","/").alias("groups_in_path"),posexplode(split("path","/")).alias("pos","value"))
        .drop("val")
        .select("id","name",concat(lit("group"),"pos").alias("group_name"),expr("groups_in_path[pos]").alias("val"))
        .groupBy([df.id, df.name])

 Edit:

Something as simple as the following works just fine (you will notice I I am now reading a regular table and not a stream, just for testing purposes):

@dlt.table
def groups_hierarchy():
     return dlt.read("streaming_silver").groupBy("id").count()

And it works fine when I apply my select statements and transformations, but the absolute last .groupBy() seems to convert it to a non Spark/Koalas DataFrame

@dlt.table
def groups_hierarchy():
     return dlt.read("streaming_silver").select("id","name",split("path","/").alias("groups_in_path"),posexplode(split("path","/")).alias("pos","value")).drop("val").select("id","name",concat(lit("group"),"pos").alias("group_name"),expr("groups_in_path[pos]").alias("val")).groupBy("id")

4 REPLIES 4

Debayan
Esteemed Contributor III

Are you trying to join two live tables using python?

Please refer the code here: https://stackoverflow.com/questions/73112299/is-there-a-way-to-join-two-live-tables-on-delta-live-ta...

Please let us know if this helps or if any further clarification is needed on the same.

PrebenOlsen
New Contributor III

No, there is no join in my code above. The problem occurs at the very last line, when trying to use a .groupBy.

This works fine when reading a non-streaming view, but fails as soon as it is a stream. Is groupBy not supported yet for streams?

Debayan
Esteemed Contributor III

Could you please try to update your spark version. For few of the used cases , spark version 2.3.0 was throwing the same error which was fixed in 2.4.0.

Please refer:

https://issues.apache.org/jira/browse/SPARK-24156

Vidula
Honored Contributor

Hi @Preben Olsen​ 

Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group