- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2023 05:12 PM
Hi, i have had this question for some weeks and didn't find any information about the topic. Specifically, my doubt is: what is the 'lifecycle' or cycle or steps to be able to use a new Python library in Databricks in terms of compatibility? For example, if I wanted to use Numba or Cython in Databricks, is this possible? And what about libraries that provide parallelism like running 'Dask' on top of a cluster framework that already allows distributed computation, is this possible, and if so, how does it work?
I'm not sure if I'm making myself understood 😅
If someone knows, could you share any resources to delve into the topic? Thank you very much, friends!
- Labels:
-
Azure databricks
-
Libraries
-
Pyspark
-
Python
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2023 12:35 AM
Basically, it is possible.
In essence databricks delivers virtual machines with a linux base and spark installed.
If you want to run other software on that hardware, it is probably possible.
Here f.e. someone installed dask on databricks. And here Databricks with Ray.
But as you can see: Databricks is not a general distributed compute platform. It is very Spark oriented. So your attempts might be succesful, or not. Depends on the library.
Cython f.e. will probably work, but my guess is that it wil only run on the driver.
So it might be easier to setup a Cython VM yourself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2023 12:35 AM
Basically, it is possible.
In essence databricks delivers virtual machines with a linux base and spark installed.
If you want to run other software on that hardware, it is probably possible.
Here f.e. someone installed dask on databricks. And here Databricks with Ray.
But as you can see: Databricks is not a general distributed compute platform. It is very Spark oriented. So your attempts might be succesful, or not. Depends on the library.
Cython f.e. will probably work, but my guess is that it wil only run on the driver.
So it might be easier to setup a Cython VM yourself.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-13-2023 12:18 AM
Hi @Carlos Caravantes
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

