Data Engineering

Forum Posts

Sorted by:

by Thefan • New Contributor II

04-28-2022 2:44:41 AM

1058 Views
0 replies
1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

Data Engineering

1058 Views
0 replies
1 kudos

04-28-2022 2:44:41 AM

by cconnell • Contributor II

09-05-2021 7:17:40 AM

650 Views
0 replies
1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

Data Engineering

650 Views
0 replies
1 kudos

09-05-2021 7:17:40 AM

by brickster_2018 • Databricks Employee

06-25-2021 3:55:10 PM

1706 Views
1 replies
0 kudos

What are the advanatges of using Koalas

Data Engineering

1706 Views
1 replies
0 kudos

06-25-2021 3:55:10 PM

View Replies

Latest Reply

amr
Databricks Employee

06-28-2021 10:43:21 AM

0 kudos

Koalas lets you run your scikit-learn code, which typically runs on one node, to a cluster of multiple nodes, and all you need to do is to change the python import from scikit-learn to Koalas, and you will have an ML code that runs on multiple nodes ...

0 kudos

06-28-2021 10:43:21 AM

by User16826994223 • Honored Contributor III

06-25-2021 3:42:38 AM

1354 Views
1 replies
0 kudos

How is Koalas different from Dask

Data Engineering

1354 Views
1 replies
0 kudos

06-25-2021 3:42:38 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:42:53 AM

0 kudos

Different projects have different focuses. Spark is already deployed in virtually every organization, and often is the primary interface to the massive amount of data stored in data lakes. Koalas was inspired by Dask, and aims to make the transition ...

0 kudos

06-25-2021 3:42:53 AM

by User16783853906 • Contributor III

06-08-2021 2:44:50 PM

1990 Views
3 replies
0 kudos

Resolved! How to resuse Pandas code in PySpark?

I have single threaded Pandas code that is both not yet supported by Koalas nor easy to reimplement in PySpark. I would like to distribute this workload using Spark without rewriting all my Pandas code - is this possible?

Data Engineering

1990 Views
3 replies
0 kudos

06-08-2021 2:44:50 PM

View Replies

Latest Reply

User16783853906
Contributor III

06-23-2021 2:28:25 PM

0 kudos

This is for a specific scenario where the code is not yet supported by Koalas. One approach to consider is using a Pandas UDF, and splitting up the work in a way that allows your processing to move forward. This notebook is a great example of taking ...

0 kudos

06-23-2021 2:28:25 PM

2 More Replies

by User16826994223 • Honored Contributor III

06-22-2021 11:18:39 PM

2271 Views
2 replies
0 kudos

Requirement to Run Koalas

Hi I am planning to run Koalas on Databricks environment, What are the requirements for running Koalas there

Data Engineering

2271 Views
2 replies
0 kudos

06-22-2021 11:18:39 PM

View Replies

Latest Reply

tj-cycyota
Databricks Employee

06-23-2021 8:05:03 AM

0 kudos

Koalas is great! This really helps ease the transition from Pandas to Spark, because you can just use the same Pandas functions/classes through the Koalas API but everything runs in the background in Spark.

0 kudos

06-23-2021 8:05:03 AM

1 More Replies

by User16830818524 • New Contributor II

06-18-2021 11:25:20 AM

1259 Views
1 replies
0 kudos

Is it possible to read a Delta table directly using Koalas?

Can I read a Delta table directly using Koalas or do I need to read using Spark and then convert the Spark dataframe to a Koalas dataframe?

Data Engineering

1259 Views
1 replies
0 kudos

06-18-2021 11:25:20 AM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-18-2021 2:02:26 PM

0 kudos

Yes, you can use the "read_delta" function. Documentation.

0 kudos

06-18-2021 2:02:26 PM

by j_weaver • New Contributor III

06-10-2021 10:57:03 AM

1443 Views
1 replies
0 kudos

Resolved! When should I use pandas, Pyspark, and Koalas?

Data Engineering

1443 Views
1 replies
0 kudos

06-10-2021 10:57:03 AM

View Replies

Latest Reply

User16752246141
New Contributor III

06-10-2021 10:59:14 AM

0 kudos

Pandas works for single machine computations, so any pandas code you write on Databricks will run on the driver of the cluster. Pyspark and Koalas are both distributed frameworks for when you have large datasets. You can use Pyspark and Koalas inte...

0 kudos

06-10-2021 10:59:14 AM

Databricks Community

Koalas dropna in DLT

medium.com

What are the advanatges of using Koalas

How is Koalas different from Dask

Resolved! How to resuse Pandas code in PySpark?

Requirement to Run Koalas

Is it possible to read a Delta table directly using Koalas?

Resolved! When should I use pandas, Pyspark, and Koalas?