- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2021 02:44 PM
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 02:28 PM
This is for a specific scenario where the code is not yet supported by Koalas. One approach to consider is using a Pandas UDF, and splitting up the work in a way that allows your processing to move forward. This notebook is a great example of taking single node processing and parallelizing it using a Pandas UDF, although it may not be a perfect fit for your challenge - https://pages.databricks.com/rs/094-YMS-629/images/Fine-Grained-Time-Series-Forecasting.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2021 04:25 PM
That is exactly what koalas is for! it's a reimplementation of most of the pandas API on top of Spark. You should be able to run your pandas code as-is, or with little modification, using koalas, and let it distribute on Spark. https://docs.databricks.com/languages/koalas.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 07:27 AM
It has become so simple once Koalas came , in place of importing Import Pandas as pd you just have to do
import databricks.koalas as pd
I kept as pd intentionally so that you do not need to change the other code , run the code , there may be some issue you can face that can be answered with koalas documentation , so its easy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 02:28 PM
This is for a specific scenario where the code is not yet supported by Koalas. One approach to consider is using a Pandas UDF, and splitting up the work in a way that allows your processing to move forward. This notebook is a great example of taking single node processing and parallelizing it using a Pandas UDF, although it may not be a perfect fit for your challenge - https://pages.databricks.com/rs/094-YMS-629/images/Fine-Grained-Time-Series-Forecasting.html

