cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Qarol
by New Contributor
  • 1195 Views
  • 2 replies
  • 0 kudos

Work around for The method `pd.groupby.GroupBy.prod()` is not implemented yet.

0I have a database with two columns: name (str) and probability (float).I am running this command:df[['name','probability']].groupby('name').prod()on a Databricks (runtime 7.3) notebook and df is a koalas dataframe.The error I get is:PandasNotImpleme...

  • 1195 Views
  • 2 replies
  • 0 kudos
Latest Reply
alenka
New Contributor III
  • 0 kudos

The same trouble  

  • 0 kudos
1 More Replies
PrebenOlsen
by New Contributor III
  • 2548 Views
  • 4 replies
  • 1 kudos

GroupBy in delta live tables fails with error "RuntimeError: Query function must return either a Spark or Koalas DataFrame"

I have a delta live table that I'm trying to run GroupBy on, but getting an error: "RuntimeError: Query function must return either a Spark or Koalas DataFrame". Here is my code:@dlt.table def groups_hierarchy():   df = dlt.read_stream("groups_h...

  • 2548 Views
  • 4 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Preben Olsen​ Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
3 More Replies
Anonymous
by Not applicable
  • 961 Views
  • 1 replies
  • 0 kudos

Resolved! Converting between Pandas to Koalas

When and why should I convert b/w a Pandas to Koalas dataframe? What are the implications?

  • 961 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Koalas is distributed on a Databricks cluster similar to how Spark dataframes are also distributed. Pandas dataframes only live on the spark driver in memory. If you are a pandas user and are using a multi-node cluster then you should use koalas to p...

  • 0 kudos
Labels