cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Thefan
by New Contributor II
  • 525 Views
  • 0 replies
  • 1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

  • 525 Views
  • 0 replies
  • 1 kudos
cconnell
by Contributor II
  • 299 Views
  • 0 replies
  • 1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

  • 299 Views
  • 0 replies
  • 1 kudos
User16869510359
by Esteemed Contributor
  • 747 Views
  • 1 replies
  • 0 kudos
  • 747 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
New Contributor III
  • 0 kudos

Koalas lets you run your scikit-learn code, which typically runs on one node, to a cluster of multiple nodes, and all you need to do is to change the python import from scikit-learn to Koalas, and you will have an ML code that runs on multiple nodes ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 757 Views
  • 1 replies
  • 0 kudos
  • 757 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Different projects have different focuses. Spark is already deployed in virtually every organization, and often is the primary interface to the massive amount of data stored in data lakes. Koalas was inspired by Dask, and aims to make the transition ...

  • 0 kudos
User16783853906
by Contributor III
  • 1039 Views
  • 3 replies
  • 0 kudos

Resolved! How to resuse Pandas code in PySpark?

I have single threaded Pandas code that is both not yet supported by Koalas nor easy to reimplement in PySpark. I would like to distribute this workload using Spark without rewriting all my Pandas code - is this possible?

  • 1039 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

This is for a specific scenario where the code is not yet supported by Koalas. One approach to consider is using a Pandas UDF, and splitting up the work in a way that allows your processing to move forward. This notebook is a great example of taking ...

  • 0 kudos
2 More Replies
User16826994223
by Honored Contributor III
  • 1054 Views
  • 2 replies
  • 0 kudos

Requirement to Run Koalas

Hi I am planning to run Koalas on Databricks environment, What are the requirements for running Koalas there

  • 1054 Views
  • 2 replies
  • 0 kudos
Latest Reply
tj-cycyota
New Contributor III
  • 0 kudos

Koalas is great! This really helps ease the transition from Pandas to Spark, because you can just use the same Pandas functions/classes through the Koalas API but everything runs in the background in Spark.

  • 0 kudos
1 More Replies
User16830818524
by New Contributor II
  • 672 Views
  • 1 replies
  • 0 kudos

Is it possible to read a Delta table directly using Koalas?

Can I read a Delta table directly using Koalas or do I need to read using Spark and then convert the Spark dataframe to a Koalas dataframe?

  • 672 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Yes, you can use the "read_delta" function. Documentation.

  • 0 kudos
j_weaver
by New Contributor III
  • 722 Views
  • 1 replies
  • 0 kudos
  • 722 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752246141
New Contributor III
  • 0 kudos

Pandas works for single machine computations, so any pandas code you write on Databricks will run on the driver of the cluster. Pyspark and Koalas are both distributed frameworks for when you have large datasets. You can use Pyspark and Koalas inte...

  • 0 kudos
Labels